Re: [zfs-discuss] ZFS over iSCSI question
Thomas Nau [EMAIL PROTECTED] wrote: fflush(fp); fsync(fileno(fp)); fclose(fp); and check errors. (It's remarkable how often people get the above sequence wrong and only do something like fsync(fileno(fp)); fclose(fp); Thanks for clarifying! Seems I really need to check the apps with truss or dtrace to see if they use that sequence. Allow me one more question: why is fflush() required prior to fsync()? You cannot simply verify this with truss unless you trace libc::fflush() too. You need to call fflush() before, in order to move the user space cache to the kernel. Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] missing features?Could/should zfs support a new ioctl, constrained if neede
_FIOSATIME - why doesn't zfs support this (assuming I didn't just miss it)? Might be handy for backups. Could/should zfs support a new ioctl, constrained if needed to files of zero size, that sets an explicit (and fixed) blocksize for a particular file? That might be useful for performance in special cases when one didn't necessarily want to specify (or depend on the specification of perhaps) the attribute at the filesystem level. One could imagine a database that was itself tunable per-file to a similar range of blocksizes, which would almost certainly benefit if it used those sizes for the corresponding files. Additional capabilities that might be desirable: setting the blocksize to zero to let the system return to default behavior for a file; being able to discover the file's blocksize (does fstat() report this?) as well as whether it was fixed at the filesystem level, at the file level, or in default state. Wasn't there some work going on to add real per-user (and maybe per-group) quotas, so one doesn't necessarily need to be sharing or automounting thousands of individual filesystems (slow)? Haven't heard anything lately though... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: asize is 300MB smaller than lsize - why?
Kangurek wrote: Thanks for info. My idea was to traverse changing filesystem, now I see that it will not work. I will try to traverse snapshots. Zreplicate will: 1. do snapshot @replicate_leatest and 2. send data to snapshot @replicate_leatest 3. wait X sec ( X = 20 ) 4. remove @replicate_previous, rename @replicate_latest to @replicate_previous 5. repeat from 1. I'm sure it will work, but taking snapshots will be slow on loaded filesystem. Do you have any idea how to speed up operations on snapshots. 1. remove @replicate_previous 2. rename @replicate_leatest to @replicate_previous 3. create @replicate_leatest You can avoid the rename by doing: zfs create @A again: zfs destroy @B zfs create @B zfs send @A @B zfs destroy @A zfs create @A zfs send @B @A goto again I'm not sure exactly what will be slow about taking snapshots, but one aspect might be that we have to suspend the intent log (see call to zil_suspend() in dmu_objset_snapshot_one()). I've been meaning to change that for a while now -- just let the snapshot have the (non-empty) zil header in it, but don't use it (eg. if we rollback or clone, explicitly zero out the zil header). So you might want to look into that. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over iSCSI question
On March 23, 2007 11:06:33 PM -0700 Adam Leventhal [EMAIL PROTECTED] wrote: On Fri, Mar 23, 2007 at 11:28:19AM -0700, Frank Cusack wrote: I'm in a way still hoping that it's a iSCSI related Problem as detecting dead hosts in a network can be a non trivial problem and it takes quite some time for TCP to timeout and inform the upper layers. Just a guess/hope here that FC-AL, ... do better in this case iscsi doesn't use TCP, does it? Anyway, the problem is really transport independent. It does use TCP. Were you thinking UDP? or its own IP protocol. I wouldn't have thought iSCSI would want to be subject to the vagaries of TCP. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: asize is 300MB smaller than lsize - why?
Matthew Ahrens wrote On 03/24/07 12:13,: Kangurek wrote: Thanks for info. My idea was to traverse changing filesystem, now I see that it will not work. I will try to traverse snapshots. Zreplicate will: 1. do snapshot @replicate_leatest and 2. send data to snapshot @replicate_leatest 3. wait X sec ( X = 20 ) 4. remove @replicate_previous, rename @replicate_latest to @replicate_previous 5. repeat from 1. I'm sure it will work, but taking snapshots will be slow on loaded filesystem. Do you have any idea how to speed up operations on snapshots. 1. remove @replicate_previous 2. rename @replicate_leatest to @replicate_previous 3. create @replicate_leatest You can avoid the rename by doing: zfs create @A again: zfs destroy @B zfs create @B zfs send @A @B zfs destroy @A zfs create @A zfs send @B @A goto again I'm not sure exactly what will be slow about taking snapshots, but one aspect might be that we have to suspend the intent log (see call to zil_suspend() in dmu_objset_snapshot_one()). I've been meaning to change that for a while now -- just let the snapshot have the (non-empty) zil header in it, but don't use it (eg. if we rollback or clone, explicitly zero out the zil header). So you might want to look into that. I've always thought the slowness was due to the txg_wait_synced(). I just counted 5 for one snapshot: [0] $c zfs`txg_wait_synced+0xc(30005c51dc0, 0, 7aa610d3, 70170800, ...) zfs`zil_commit_writer+0x34c(30010c55200, 151, 151, 1, 3fe, 7aa84600) zfs`zil_commit+0x68(30010c55200, 151, 0, 30010c5527c, 151, 0) zfs`zil_suspend+0xc0(30010c55200, 2a1010db240, 0, 0, 30014b32e00, 0) zfs`dmu_objset_snapshot_one+0x74(0, 2a1010db420, 7aa60700, 0, 0, 0) zfs`dmu_objset_snapshot+0xe8(300265bd000, 300265bd400, 0, 0, ...) zfs`zfsdev_ioctl+0x12c(701cf9f0, 701cf660, ffbfe850, 390, 701cf400, ...) [0] $c zfs`txg_wait_synced+0xc(30005c51dc0, 3, 151, c00431549f, 3fe, 7aa84600) zfs`zil_destroy+0xc(30010c55200, 0, 0, 30010c5527c, 30014b32e00, 0) zfs`zil_suspend+0x108(30010c55200, 2a1010db240, 30010c5527c, 0, 30014b32e00, 0) zfs`dmu_objset_snapshot_one+0x74(0, 2a1010db420, 7aa60700, 0, 0, 0) zfs`dmu_objset_snapshot+0xe8(300265bd000, 300265bd400, 0, 0, ...) zfs`zfsdev_ioctl+0x12c(701cf9f0, 701cf660, ffbfe850, 390, 701cf400,...) [0] $c zfs`txg_wait_synced+0xc(30005c51dc0, 36f8, 30593b0, 1f8, 1f8, 180c000) zfs`zil_destroy+0x1b0(30010c55200, 0, 701d5760, 30010c5527c, ...) zfs`zil_suspend+0x108(30010c55200, 2a1010db240, 30010c5527c, 0, 30014b32e00, 0) zfs`dmu_objset_snapshot_one+0x74(0, 2a1010db420, 7aa60700, 0, 0, 0) zfs`dmu_objset_snapshot+0xe8(300265bd000, 300265bd400, 0, 0, ...) zfs`zfsdev_ioctl+0x12c(701cf9f0, 701cf660, ffbfe850, 390, 701cf400, ...) [0] $c zfs`txg_wait_synced+0xc(30005c51dc0, 36f9, 30593b0, 1f8, 1f8, 180c000) zfs`dsl_sync_task_group_wait+0x11c(300109a7ac8, 30005c51dc0, 7aa60700, ...) zfs`dmu_objset_snapshot+0x100(300265bd000, 300265bd400, 0, 0, ...) zfs`zfsdev_ioctl+0x12c(701cf9f0, 701cf660, ffbfe850, 390, 701cf400, ...) [0] $c zfs`txg_wait_synced+0xc(30005c51dc0, 36fa, 30593b0, 1f8, 1f8, 180c000) zfs`dsl_sync_task_group_wait+0x11c(300109a7ac8, 30005c51dc0, ...) zfs`dsl_sync_task_do+0x28(30005c51dc0, 0, 7aa2d898, 300028f7680,...) zfs`spa_history_log+0x30(300028f7680, 3000dee1490, 0, 7aa2d800, 1, 18) zfs`zfs_ioc_pool_log_history+0xd8(7aa64c00, 0, 17, 18, 3000dee1490, 7aa64c00) zfs`zfsdev_ioctl+0x12c(701cf768, 701cf660, ffbfe850, 108, 701cf400,...) --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: asize is 300MB smaller than lsize - why?
Neil Perrin wrote: I'm not sure exactly what will be slow about taking snapshots, but one aspect might be that we have to suspend the intent log (see call to zil_suspend() in dmu_objset_snapshot_one()). I've been meaning to change that for a while now -- just let the snapshot have the (non-empty) zil header in it, but don't use it (eg. if we rollback or clone, explicitly zero out the zil header). So you might want to look into that. I've always thought the slowness was due to the txg_wait_synced(). I just counted 5 for one snapshot: Yeah, well 3 of the 5 are for zil_suspend(), so I think you've proved my point :-) I believe that the one from spa_history_log() will go away with MarkS's delegated admin work, leaving just the one actually do it txg_wait_synced(). Bottom line, it shouldn be possible to make zfs snapshot take 5x less time, without an extraordinary effort. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: asize is 300MB smaller than lsize - why?
Matthew Ahrens wrote On 03/24/07 12:36,: Neil Perrin wrote: I'm not sure exactly what will be slow about taking snapshots, but one aspect might be that we have to suspend the intent log (see call to zil_suspend() in dmu_objset_snapshot_one()). I've been meaning to change that for a while now -- just let the snapshot have the (non-empty) zil header in it, but don't use it (eg. if we rollback or clone, explicitly zero out the zil header). So you might want to look into that. I've always thought the slowness was due to the txg_wait_synced(). I just counted 5 for one snapshot: Yeah, well 3 of the 5 are for zil_suspend(), so I think you've proved my point :-) I believe that the one from spa_history_log() will go away with MarkS's delegated admin work, leaving just the one actually do it txg_wait_synced(). Bottom line, it shouldn be possible to make zfs snapshot take 5x less time, without an extraordinary effort. I'm not sure. Doing one will take the same time as more than one (assuming same txg) but at least one is needed to ensure all transactions prior to the snapshot are committed. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over iSCSI question
On Sat, Mar 24, 2007 at 11:20:38AM -0700, Frank Cusack wrote: iscsi doesn't use TCP, does it? Anyway, the problem is really transport independent. It does use TCP. Were you thinking UDP? or its own IP protocol. I wouldn't have thought iSCSI would want to be subject to the vagaries of TCP. No, you'll find that iSCSI does indeed us TCP, for better or for worse. ;) -brian -- The reason I don't use Gnome: every single other window manager I know of is very powerfully extensible, where you can switch actions to different mouse buttons. Guess which one is not, because it might confuse the poor users? Here's a hint: it's not the small and fast one.--Linus ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss