Re: [zfs-discuss] ZFS over iSCSI question

2007-03-24 Thread Joerg Schilling
Thomas Nau [EMAIL PROTECTED] wrote:

  fflush(fp);
  fsync(fileno(fp));
  fclose(fp);
 
  and check errors.
 
 
  (It's remarkable how often people get the above sequence wrong and only
  do something like fsync(fileno(fp)); fclose(fp);


 Thanks for clarifying! Seems I really need to check the apps with truss or 
 dtrace to see if they use that sequence. Allow me one more question: why 
 is fflush() required prior to fsync()?

You cannot simply verify this with truss unless you trace libc::fflush() too.

You need to call fflush() before, in order to move the user space cache to the
kernel.


Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] missing features?Could/should zfs support a new ioctl, constrained if neede

2007-03-24 Thread Richard L. Hamilton
_FIOSATIME - why doesn't zfs support this (assuming I didn't just miss it)?
Might be handy for backups.

Could/should zfs support a new ioctl, constrained if needed to files of
zero size, that sets an explicit (and fixed) blocksize for a particular
file?  That might be useful for performance in special cases when one
didn't necessarily want to specify (or depend on the specification of
perhaps) the attribute at the filesystem level.  One could imagine a
database that was itself tunable per-file to a similar range of
blocksizes, which would almost certainly benefit if it used those sizes
for the corresponding files.  Additional capabilities that might be
desirable: setting the blocksize to zero to let the system return to
default behavior for a file; being able to discover the file's blocksize
(does fstat() report this?) as well as whether it was fixed at the
filesystem level, at the file level, or in default state.

Wasn't there some work going on to add real per-user (and maybe per-group)
quotas, so one doesn't necessarily need to be sharing or automounting
thousands of individual filesystems (slow)?  Haven't heard anything lately 
though...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: asize is 300MB smaller than lsize - why?

2007-03-24 Thread Matthew Ahrens

Kangurek wrote:

Thanks for info.
My idea was to traverse changing filesystem, now I see that it will not 
work.

I will try to traverse snapshots. Zreplicate will:
1. do snapshot @replicate_leatest and
2. send data to snapshot @replicate_leatest
3. wait X sec   ( X = 20 )
4. remove @replicate_previous,  rename @replicate_latest to 
@replicate_previous

5. repeat from 1.

I'm sure it will work, but taking snapshots will be slow on loaded 
filesystem.

Do you have any idea how to speed up operations on snapshots.
1. remove @replicate_previous
2. rename @replicate_leatest to @replicate_previous
3. create @replicate_leatest


You can avoid the rename by doing:

zfs create @A
again:
zfs destroy @B
zfs create @B
zfs send @A @B
zfs destroy @A
zfs create @A
zfs send @B @A
goto again

I'm not sure exactly what will be slow about taking snapshots, but one 
aspect might be that we have to suspend the intent log (see call to 
zil_suspend() in dmu_objset_snapshot_one()).  I've been meaning to 
change that for a while now -- just let the snapshot have the 
(non-empty) zil header in it, but don't use it (eg. if we rollback or 
clone, explicitly zero out the zil header).  So you might want to look 
into that.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over iSCSI question

2007-03-24 Thread Frank Cusack

On March 23, 2007 11:06:33 PM -0700 Adam Leventhal [EMAIL PROTECTED] wrote:

On Fri, Mar 23, 2007 at 11:28:19AM -0700, Frank Cusack wrote:

 I'm in a way still hoping that it's a iSCSI related Problem as
 detecting dead hosts in a network can be a non trivial problem and it
 takes quite some time for TCP to timeout and inform the upper layers.
 Just a guess/hope here that FC-AL, ... do better in this case

iscsi doesn't use TCP, does it?  Anyway, the problem is really transport
independent.


It does use TCP. Were you thinking UDP?


or its own IP protocol.  I wouldn't have thought iSCSI would want to be
subject to the vagaries of TCP.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: asize is 300MB smaller than lsize - why?

2007-03-24 Thread Neil Perrin



Matthew Ahrens wrote On 03/24/07 12:13,:

Kangurek wrote:


Thanks for info.
My idea was to traverse changing filesystem, now I see that it will 
not work.

I will try to traverse snapshots. Zreplicate will:
1. do snapshot @replicate_leatest and
2. send data to snapshot @replicate_leatest
3. wait X sec   ( X = 20 )
4. remove @replicate_previous,  rename @replicate_latest to 
@replicate_previous

5. repeat from 1.

I'm sure it will work, but taking snapshots will be slow on loaded 
filesystem.

Do you have any idea how to speed up operations on snapshots.
1. remove @replicate_previous
2. rename @replicate_leatest to @replicate_previous
3. create @replicate_leatest



You can avoid the rename by doing:

zfs create @A
again:
zfs destroy @B
zfs create @B
zfs send @A @B
zfs destroy @A
zfs create @A
zfs send @B @A
goto again

I'm not sure exactly what will be slow about taking snapshots, but one 
aspect might be that we have to suspend the intent log (see call to 
zil_suspend() in dmu_objset_snapshot_one()).  I've been meaning to 
change that for a while now -- just let the snapshot have the 
(non-empty) zil header in it, but don't use it (eg. if we rollback or 
clone, explicitly zero out the zil header).  So you might want to look 
into that.


I've always thought the slowness was due to the txg_wait_synced().
I just counted 5 for one snapshot:

[0] $c
zfs`txg_wait_synced+0xc(30005c51dc0, 0, 7aa610d3, 70170800, ...)
zfs`zil_commit_writer+0x34c(30010c55200, 151, 151, 1, 3fe, 7aa84600)
zfs`zil_commit+0x68(30010c55200, 151, 0, 30010c5527c, 151, 0)
zfs`zil_suspend+0xc0(30010c55200, 2a1010db240, 0, 0, 30014b32e00, 0)
zfs`dmu_objset_snapshot_one+0x74(0, 2a1010db420, 7aa60700, 0, 0, 0)
zfs`dmu_objset_snapshot+0xe8(300265bd000, 300265bd400, 0, 0, ...)
zfs`zfsdev_ioctl+0x12c(701cf9f0, 701cf660, ffbfe850, 390, 701cf400, ...)

[0] $c
zfs`txg_wait_synced+0xc(30005c51dc0, 3, 151, c00431549f, 3fe, 7aa84600)
zfs`zil_destroy+0xc(30010c55200, 0, 0, 30010c5527c, 30014b32e00, 0)
zfs`zil_suspend+0x108(30010c55200, 2a1010db240, 30010c5527c, 0, 30014b32e00, 0)
zfs`dmu_objset_snapshot_one+0x74(0, 2a1010db420, 7aa60700, 0, 0, 0)
zfs`dmu_objset_snapshot+0xe8(300265bd000, 300265bd400, 0, 0, ...)
zfs`zfsdev_ioctl+0x12c(701cf9f0, 701cf660, ffbfe850, 390, 701cf400,...)

[0] $c
zfs`txg_wait_synced+0xc(30005c51dc0, 36f8, 30593b0, 1f8, 1f8, 180c000)
zfs`zil_destroy+0x1b0(30010c55200, 0, 701d5760, 30010c5527c, ...)
zfs`zil_suspend+0x108(30010c55200, 2a1010db240, 30010c5527c, 0, 30014b32e00, 0)
zfs`dmu_objset_snapshot_one+0x74(0, 2a1010db420, 7aa60700, 0, 0, 0)
zfs`dmu_objset_snapshot+0xe8(300265bd000, 300265bd400, 0, 0, ...)
zfs`zfsdev_ioctl+0x12c(701cf9f0, 701cf660, ffbfe850, 390, 701cf400, ...)

[0] $c
zfs`txg_wait_synced+0xc(30005c51dc0, 36f9, 30593b0, 1f8, 1f8, 180c000)
zfs`dsl_sync_task_group_wait+0x11c(300109a7ac8, 30005c51dc0, 7aa60700, ...)
zfs`dmu_objset_snapshot+0x100(300265bd000, 300265bd400, 0, 0, ...)
zfs`zfsdev_ioctl+0x12c(701cf9f0, 701cf660, ffbfe850, 390, 701cf400, ...)

[0] $c
zfs`txg_wait_synced+0xc(30005c51dc0, 36fa, 30593b0, 1f8, 1f8, 180c000)
zfs`dsl_sync_task_group_wait+0x11c(300109a7ac8, 30005c51dc0, ...)
zfs`dsl_sync_task_do+0x28(30005c51dc0, 0, 7aa2d898, 300028f7680,...)
zfs`spa_history_log+0x30(300028f7680, 3000dee1490, 0, 7aa2d800, 1, 18)
zfs`zfs_ioc_pool_log_history+0xd8(7aa64c00, 0, 17, 18, 3000dee1490, 7aa64c00)
zfs`zfsdev_ioctl+0x12c(701cf768, 701cf660, ffbfe850, 108, 701cf400,...)



--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: asize is 300MB smaller than lsize - why?

2007-03-24 Thread Matthew Ahrens

Neil Perrin wrote:
I'm not sure exactly what will be slow about taking snapshots, but one 
aspect might be that we have to suspend the intent log (see call to 
zil_suspend() in dmu_objset_snapshot_one()).  I've been meaning to 
change that for a while now -- just let the snapshot have the 
(non-empty) zil header in it, but don't use it (eg. if we rollback or 
clone, explicitly zero out the zil header).  So you might want to look 
into that.


I've always thought the slowness was due to the txg_wait_synced().
I just counted 5 for one snapshot:


Yeah, well 3 of the 5 are for zil_suspend(), so I think you've proved my 
point :-)


I believe that the one from spa_history_log() will go away with MarkS's 
delegated admin work, leaving just the one actually do it 
txg_wait_synced().


Bottom line, it shouldn be possible to make zfs snapshot take 5x less 
time, without an extraordinary effort.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: asize is 300MB smaller than lsize - why?

2007-03-24 Thread Neil Perrin



Matthew Ahrens wrote On 03/24/07 12:36,:

Neil Perrin wrote:

I'm not sure exactly what will be slow about taking snapshots, but 
one aspect might be that we have to suspend the intent log (see call 
to zil_suspend() in dmu_objset_snapshot_one()).  I've been meaning to 
change that for a while now -- just let the snapshot have the 
(non-empty) zil header in it, but don't use it (eg. if we rollback or 
clone, explicitly zero out the zil header).  So you might want to 
look into that.



I've always thought the slowness was due to the txg_wait_synced().
I just counted 5 for one snapshot:



Yeah, well 3 of the 5 are for zil_suspend(), so I think you've proved my 
point :-)


I believe that the one from spa_history_log() will go away with MarkS's 
delegated admin work, leaving just the one actually do it 
txg_wait_synced().


Bottom line, it shouldn be possible to make zfs snapshot take 5x less 
time, without an extraordinary effort.


I'm not sure. Doing one will take the same time as more than one (assuming same 
txg)
but at least one is needed to ensure all transactions prior to the snapshot are 
committed.


Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over iSCSI question

2007-03-24 Thread Brian Hechinger
On Sat, Mar 24, 2007 at 11:20:38AM -0700, Frank Cusack wrote:
 iscsi doesn't use TCP, does it?  Anyway, the problem is really transport
 independent.
 
 It does use TCP. Were you thinking UDP?
 
 or its own IP protocol.  I wouldn't have thought iSCSI would want to be
 subject to the vagaries of TCP.

No, you'll find that iSCSI does indeed us TCP, for better or for worse. ;)

-brian
-- 
The reason I don't use Gnome: every single other window manager I know of is
very powerfully extensible, where you can switch actions to different mouse
buttons. Guess which one is not, because it might confuse the poor users?
Here's a hint: it's not the small and fast one.--Linus
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss