Re: [zfs-discuss] ZFS over iSCSI question
Thomas Nau [EMAIL PROTECTED] wrote: fflush(fp); fsync(fileno(fp)); fclose(fp); and check errors. (It's remarkable how often people get the above sequence wrong and only do something like fsync(fileno(fp)); fclose(fp); Thanks for clarifying! Seems I really need to check the apps with truss or dtrace to see if they use that sequence. Allow me one more question: why is fflush() required prior to fsync()? You cannot simply verify this with truss unless you trace libc::fflush() too. You need to call fflush() before, in order to move the user space cache to the kernel. Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over iSCSI question
On March 23, 2007 11:06:33 PM -0700 Adam Leventhal [EMAIL PROTECTED] wrote: On Fri, Mar 23, 2007 at 11:28:19AM -0700, Frank Cusack wrote: I'm in a way still hoping that it's a iSCSI related Problem as detecting dead hosts in a network can be a non trivial problem and it takes quite some time for TCP to timeout and inform the upper layers. Just a guess/hope here that FC-AL, ... do better in this case iscsi doesn't use TCP, does it? Anyway, the problem is really transport independent. It does use TCP. Were you thinking UDP? or its own IP protocol. I wouldn't have thought iSCSI would want to be subject to the vagaries of TCP. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over iSCSI question
On Sat, Mar 24, 2007 at 11:20:38AM -0700, Frank Cusack wrote: iscsi doesn't use TCP, does it? Anyway, the problem is really transport independent. It does use TCP. Were you thinking UDP? or its own IP protocol. I wouldn't have thought iSCSI would want to be subject to the vagaries of TCP. No, you'll find that iSCSI does indeed us TCP, for better or for worse. ;) -brian -- The reason I don't use Gnome: every single other window manager I know of is very powerfully extensible, where you can switch actions to different mouse buttons. Guess which one is not, because it might confuse the poor users? Here's a hint: it's not the small and fast one.--Linus ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over iSCSI question
On Fri, 23 Mar 2007, Roch - PAE wrote: I assume the rsync is not issuing fsyncs (and it's files are not opened O_DSYNC). If so, rsync just works against the filesystem cache and does not commit the data to disk. You might want to run sync(1M) after a successful rsync. A larger rsync would presumably have blocked. It's just that the amount of data you needs to rsync fitted in a couple of transaction groups. Thanks for the hints but this would make our worst nightmares become true. At least they could because it means that we would have to check every application handling critical data and I think it's not the apps responsibility. Up to a certain amount like a database transaction but not any further. There's always a time window where data might be cached in memory but I would argue that caching several GB of data, in our case written data, with thousands of files in unbuffered memory circumvents all the build in reliability of ZFS. I'm in a way still hoping that it's a iSCSI related Problem as detecting dead hosts in a network can be a non trivial problem and it takes quite some time for TCP to timeout and inform the upper layers. Just a guess/hope here that FC-AL, ... do better in this case Thomas - GPG fingerprint: B1 EE D2 39 2C 82 26 DA A5 4D E0 50 35 75 9E ED ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over iSCSI question
On March 23, 2007 6:51:10 PM +0100 Thomas Nau [EMAIL PROTECTED] wrote: Thanks for the hints but this would make our worst nightmares become true. At least they could because it means that we would have to check every application handling critical data and I think it's not the apps responsibility. I'd tend to disagree with that. POSIX/SUS does not guarantee data makes it to disk until you do an fsync() (or open the file with the right flags, or other techniques). If an application REQUIRES that data get to disk, it really MUST DTRT. Up to a certain amount like a database transaction but not any further. There's always a time window where data might be cached in memory but I would argue that caching several GB of data, in our case written data, with thousands of files in unbuffered memory circumvents all the build in reliability of ZFS. I'm in a way still hoping that it's a iSCSI related Problem as detecting dead hosts in a network can be a non trivial problem and it takes quite some time for TCP to timeout and inform the upper layers. Just a guess/hope here that FC-AL, ... do better in this case iscsi doesn't use TCP, does it? Anyway, the problem is really transport independent. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over iSCSI question
I'd tend to disagree with that. POSIX/SUS does not guarantee data makes it to disk until you do an fsync() (or open the file with the right flags, or other techniques). If an application REQUIRES that data get to disk, it really MUST DTRT. Indeed; want your data safe? Use: fflush(fp); fsync(fileno(fp)); fclose(fp); and check errors. (It's remarkable how often people get the above sequence wrong and only do something like fsync(fileno(fp)); fclose(fp); Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over iSCSI question
Thomas Nau wrote: Dear all. I've setup the following scenario: Galaxy 4200 running OpenSolaris build 59 as iSCSI target; remaining diskspace of the two internal drives with a total of 90GB is used as zpool for the two 32GB volumes exported via iSCSI The initiator is an up to date Solaris 10 11/06 x86 box using the above mentioned volumes as disks for a local zpool. Like this? disk--zpool--zvol--iscsitarget--network--iscsiclient--zpool--filesystem--app I'm in a way still hoping that it's a iSCSI related Problem as detecting dead hosts in a network can be a non trivial problem and it takes quite some time for TCP to timeout and inform the upper layers. Just a guess/hope here that FC-AL, ... do better in this case Actually, this is why NFS was invented. Prior to NFS we had something like: disk--raw--ndserver--network--ndclient--filesystem--app The problem is that the failure modes are very different for networks and presumably reliable local disk connections. Hence NFS has a lot of error handling code and provides well understood error handling semantics. Maybe what you really want is NFS? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over iSCSI question
Dear Fran Casper I'd tend to disagree with that. POSIX/SUS does not guarantee data makes it to disk until you do an fsync() (or open the file with the right flags, or other techniques). If an application REQUIRES that data get to disk, it really MUST DTRT. Indeed; want your data safe? Use: fflush(fp); fsync(fileno(fp)); fclose(fp); and check errors. (It's remarkable how often people get the above sequence wrong and only do something like fsync(fileno(fp)); fclose(fp); Thanks for clarifying! Seems I really need to check the apps with truss or dtrace to see if they use that sequence. Allow me one more question: why is fflush() required prior to fsync()? Putting all pieces together this means that if the app doesn't do it it suffered from the problem with UFS anyway just with typically smaller caches, right? Thanks again Thomas - GPG fingerprint: B1 EE D2 39 2C 82 26 DA A5 4D E0 50 35 75 9E ED ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over iSCSI question
Richard, Like this? disk--zpool--zvol--iscsitarget--network--iscsiclient--zpool--filesystem--app exactly I'm in a way still hoping that it's a iSCSI related Problem as detecting dead hosts in a network can be a non trivial problem and it takes quite some time for TCP to timeout and inform the upper layers. Just a guess/hope here that FC-AL, ... do better in this case Actually, this is why NFS was invented. Prior to NFS we had something like: disk--raw--ndserver--network--ndclient--filesystem--app The problem is that our NFS, Mail, DB and other servers use mirrrored disks located in different building on campus. Currently we use FCAL devices and recently switched from UFS to ZFS. The drawback with FCAL is that you always need to have a second infrastructure (not the real problem) but with different components. Having all ethernet would be much easier. The problem is that the failure modes are very different for networks and presumably reliable local disk connections. Hence NFS has a lot of error handling code and provides well understood error handling semantics. Maybe what you really want is NFS? We thought about using NFS as backend for as much as possible applications but we need to have redundancy for the fileserver itself too Thomas - GPG fingerprint: B1 EE D2 39 2C 82 26 DA A5 4D E0 50 35 75 9E ED ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over iSCSI question
Thanks for clarifying! Seems I really need to check the apps with truss or dtrace to see if they use that sequence. Allow me one more question: why is fflush() required prior to fsync()? When you use stdio, you need to make sure the data is in the system buffers prior to call fsync. fclose() will otherwise write the rest of the data which is not sync'ed. (In S10 I fixed this for /etc/*_* driver files , they are generally under 8 K and therefor never written to disk before fsync'ed if not preceeded by fflush(). Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over iSCSI question
On Fri, Mar 23, 2007 at 11:28:19AM -0700, Frank Cusack wrote: I'm in a way still hoping that it's a iSCSI related Problem as detecting dead hosts in a network can be a non trivial problem and it takes quite some time for TCP to timeout and inform the upper layers. Just a guess/hope here that FC-AL, ... do better in this case iscsi doesn't use TCP, does it? Anyway, the problem is really transport independent. It does use TCP. Were you thinking UDP? Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss