Re: [zfs-discuss] ZFS over iSCSI question

2007-03-24 Thread Joerg Schilling
Thomas Nau [EMAIL PROTECTED] wrote:

  fflush(fp);
  fsync(fileno(fp));
  fclose(fp);
 
  and check errors.
 
 
  (It's remarkable how often people get the above sequence wrong and only
  do something like fsync(fileno(fp)); fclose(fp);


 Thanks for clarifying! Seems I really need to check the apps with truss or 
 dtrace to see if they use that sequence. Allow me one more question: why 
 is fflush() required prior to fsync()?

You cannot simply verify this with truss unless you trace libc::fflush() too.

You need to call fflush() before, in order to move the user space cache to the
kernel.


Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over iSCSI question

2007-03-24 Thread Frank Cusack

On March 23, 2007 11:06:33 PM -0700 Adam Leventhal [EMAIL PROTECTED] wrote:

On Fri, Mar 23, 2007 at 11:28:19AM -0700, Frank Cusack wrote:

 I'm in a way still hoping that it's a iSCSI related Problem as
 detecting dead hosts in a network can be a non trivial problem and it
 takes quite some time for TCP to timeout and inform the upper layers.
 Just a guess/hope here that FC-AL, ... do better in this case

iscsi doesn't use TCP, does it?  Anyway, the problem is really transport
independent.


It does use TCP. Were you thinking UDP?


or its own IP protocol.  I wouldn't have thought iSCSI would want to be
subject to the vagaries of TCP.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over iSCSI question

2007-03-24 Thread Brian Hechinger
On Sat, Mar 24, 2007 at 11:20:38AM -0700, Frank Cusack wrote:
 iscsi doesn't use TCP, does it?  Anyway, the problem is really transport
 independent.
 
 It does use TCP. Were you thinking UDP?
 
 or its own IP protocol.  I wouldn't have thought iSCSI would want to be
 subject to the vagaries of TCP.

No, you'll find that iSCSI does indeed us TCP, for better or for worse. ;)

-brian
-- 
The reason I don't use Gnome: every single other window manager I know of is
very powerfully extensible, where you can switch actions to different mouse
buttons. Guess which one is not, because it might confuse the poor users?
Here's a hint: it's not the small and fast one.--Linus
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Thomas Nau

On Fri, 23 Mar 2007, Roch - PAE wrote:


I assume the rsync is not issuing fsyncs (and it's files are
not opened O_DSYNC). If so,  rsync just works against the
filesystem cache and does not commit the data to disk.

You might want to run sync(1M) after a successful rsync.

A larger  rsync would presumably have blocked. It's just
that the amount of data you needs to rsync fitted in a couple of
transaction groups.


Thanks for the hints but this would make our worst nightmares become true. 
At least they could because it means that we would have to check every 
application handling critical data and I think it's not the apps 
responsibility. Up to a certain amount like a database transaction but not 
any further. There's always a time window where data might be cached in 
memory but I would argue that caching several GB of data, in our case 
written data, with thousands of files in unbuffered memory circumvents all 
the build in reliability of ZFS.


I'm in a way still hoping that it's a iSCSI related Problem as detecting 
dead hosts in a network can be a non trivial problem and it takes quite 
some time for TCP to timeout and inform the upper layers. Just a 
guess/hope here that FC-AL, ... do better in this case


Thomas

-
GPG fingerprint: B1 EE D2 39 2C 82 26 DA  A5 4D E0 50 35 75 9E ED
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Frank Cusack

On March 23, 2007 6:51:10 PM +0100 Thomas Nau [EMAIL PROTECTED] wrote:

Thanks for the hints but this would make our worst nightmares become
true. At least they could because it means that we would have to check
every application handling critical data and I think it's not the apps
responsibility.


I'd tend to disagree with that.  POSIX/SUS does not guarantee data makes
it to disk until you do an fsync() (or open the file with the right flags,
or other techniques).  If an application REQUIRES that data get to disk,
it really MUST DTRT.


Up to a certain amount like a database transaction but
not any further. There's always a time window where data might be cached
in memory but I would argue that caching several GB of data, in our case
written data, with thousands of files in unbuffered memory circumvents
all the build in reliability of ZFS.

I'm in a way still hoping that it's a iSCSI related Problem as detecting
dead hosts in a network can be a non trivial problem and it takes quite
some time for TCP to timeout and inform the upper layers. Just a
guess/hope here that FC-AL, ... do better in this case


iscsi doesn't use TCP, does it?  Anyway, the problem is really transport
independent.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Casper . Dik

I'd tend to disagree with that.  POSIX/SUS does not guarantee data makes
it to disk until you do an fsync() (or open the file with the right flags,
or other techniques).  If an application REQUIRES that data get to disk,
it really MUST DTRT.

Indeed; want your data safe?  Use:

fflush(fp);
fsync(fileno(fp));
fclose(fp);

and check errors.


(It's remarkable how often people get the above sequence wrong and only
do something like fsync(fileno(fp)); fclose(fp);


Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Richard Elling

Thomas Nau wrote:

Dear all.
I've setup the following scenario:

Galaxy 4200 running OpenSolaris build 59 as iSCSI target; remaining 
diskspace of the two internal drives with a total of 90GB is used as 
zpool for the two 32GB volumes exported via iSCSI


The initiator is an up to date Solaris 10 11/06 x86 box using the above 
mentioned volumes as disks for a local zpool.


Like this?
disk--zpool--zvol--iscsitarget--network--iscsiclient--zpool--filesystem--app

 I'm in a way still hoping that it's a iSCSI related Problem as detecting
 dead hosts in a network can be a non trivial problem and it takes quite
 some time for TCP to timeout and inform the upper layers. Just a
 guess/hope here that FC-AL, ... do better in this case

Actually, this is why NFS was invented.  Prior to NFS we had something like:
disk--raw--ndserver--network--ndclient--filesystem--app

The problem is that the failure modes are very different for networks and
presumably reliable local disk connections.  Hence NFS has a lot of error
handling code and provides well understood error handling semantics.  Maybe
what you really want is NFS?
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Thomas Nau

Dear Fran  Casper


I'd tend to disagree with that.  POSIX/SUS does not guarantee data makes
it to disk until you do an fsync() (or open the file with the right flags,
or other techniques).  If an application REQUIRES that data get to disk,
it really MUST DTRT.


Indeed; want your data safe?  Use:

fflush(fp);
fsync(fileno(fp));
fclose(fp);

and check errors.


(It's remarkable how often people get the above sequence wrong and only
do something like fsync(fileno(fp)); fclose(fp);



Thanks for clarifying! Seems I really need to check the apps with truss or 
dtrace to see if they use that sequence. Allow me one more question: why 
is fflush() required prior to fsync()?


Putting all pieces together this means that if the app doesn't do it it 
suffered from the problem with UFS anyway just with typically smaller 
caches, right?


Thanks again
Thomas

-
GPG fingerprint: B1 EE D2 39 2C 82 26 DA  A5 4D E0 50 35 75 9E ED
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Thomas Nau

Richard,


Like this?
disk--zpool--zvol--iscsitarget--network--iscsiclient--zpool--filesystem--app


exactly


I'm in a way still hoping that it's a iSCSI related Problem as detecting
dead hosts in a network can be a non trivial problem and it takes quite
some time for TCP to timeout and inform the upper layers. Just a
guess/hope here that FC-AL, ... do better in this case


Actually, this is why NFS was invented.  Prior to NFS we had something like:
disk--raw--ndserver--network--ndclient--filesystem--app


The problem is that our NFS, Mail, DB and other servers use mirrrored 
disks located in different building on campus. Currently we use FCAL 
devices and recently switched from UFS to ZFS. The drawback with FCAL is 
that you always need to have a second infrastructure (not the real 
problem) but with different components. Having all ethernet would be much 
easier.



The problem is that the failure modes are very different for networks and
presumably reliable local disk connections.  Hence NFS has a lot of error
handling code and provides well understood error handling semantics.  Maybe
what you really want is NFS?


We thought about using NFS as backend for as much as possible applications 
but we need to have redundancy for the fileserver itself too


Thomas

-
GPG fingerprint: B1 EE D2 39 2C 82 26 DA  A5 4D E0 50 35 75 9E ED
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Casper . Dik

Thanks for clarifying! Seems I really need to check the apps with truss or 
dtrace to see if they use that sequence. Allow me one more question: why 
is fflush() required prior to fsync()?

When you use stdio, you need to make sure the data is in the
system buffers prior to call fsync.

fclose() will otherwise write the rest of the data which is not sync'ed.


(In S10 I fixed this for /etc/*_* driver files , they are generally
under 8 K and therefor never written to disk before fsync'ed
if not preceeded by fflush().

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over iSCSI question

2007-03-23 Thread Adam Leventhal
On Fri, Mar 23, 2007 at 11:28:19AM -0700, Frank Cusack wrote:
 I'm in a way still hoping that it's a iSCSI related Problem as detecting
 dead hosts in a network can be a non trivial problem and it takes quite
 some time for TCP to timeout and inform the upper layers. Just a
 guess/hope here that FC-AL, ... do better in this case
 
 iscsi doesn't use TCP, does it?  Anyway, the problem is really transport
 independent.

It does use TCP. Were you thinking UDP?

Adam

-- 
Adam Leventhal, Solaris Kernel Development   http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss