On Sep 9, 2010, at 10:09 AM, Arne Jansen wrote:

> Hi Neil,
> 
> Neil Perrin wrote:
>> NFS often demands it's transactions are stable before returning.
>> This forces ZFS to do the system call synchronously. Usually the
>> ZIL (code) allocates and writes a new block in the intent log chain to 
>> achieve this.
>> If ever it fails to allocate a block (of the size requested) it it forced
>> to close the txg containing the system call. Yes this can be extremely
>> slow but there is no other option for the ZIL. I'm surprised the wait is 30 
>> seconds.
>> I would expect mush less, but finding room for the rest of the txg data and 
>> metadata
>> would also be a challenge.
> 
> I think this is not what we saw, for two reason:
> a) we have a mirrored slog device. According to zpool iostat -v only 16MB
>    out of 4GB were in use.
> b) it didn't seem like the txg would have been closed early. Rather it kept
>    approximately the 30 second intervals.
> 
> Internally we came up with a different explanation, without any backing that
> it might be correct: When the pool reaches 96%, zfs goes into a 'self defense'
> mode. Instead of allocating block from ZIL, every write turns synchronous and
> has to wait for the txg to finish naturally. The reasoning behind this might
> be that even if ZIL is available, there might not be enough space left to 
> commit
> the ZIL to the pool. To prevent this, zfs doen't use ZIL when the pool is 
> above
> 96%. While this might be proper for small pools, on large pools 4% are still
> some TB of free space, so there should be an upper limit of maybe 10GB on this
> hidden reserve.

I do not believe this is correct.  At 96% the first-fit algorithm changes to 
best-fit
and ganging can be expected. This has nothing to do with the ZIL.  There is 
already a reserve set aside for metadata and the ZIL so that you can remove
files when the file system is 100% full.  This reserve is 32 MB or 1/64 of the 
pool
size.

> Also this sudden switch of behavior is completely unexpected and at least 
> under-
> documented.

Methinks you are just seeing the change in performance from the allocation
algorithm change.

> 
>> Most (maybe all?) file systems perform badly when out of space. I believe we 
>> give a recommended
>> free size and I thought it was 90%.
> 
> In this situation, not only writes suffered, but as a side effect reads also
> came to a nearly complete halt.

If you have atime=on, then reads create writes.
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com
ZFS and performance consulting
http://www.RichardElling.com












_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to