On Thu, Oct 10, 2013 at 02:12:38PM -0700, Matthew Ahrens wrote:
>
>     In my testing I observe that dmu_tx_delay() _always_ returns here
>     under normal conditions:
>
>         1034         if (now > tx->tx_start + min_tx_time)
>         1035                 return;
>
>     I haven't yet explained why this is the case.  It must either be that
>     dirty is staying very close to dirty_delay_min_bytes, or some overhead
>     higher up in the call path has already incurred enough delay.
>
>
> The delay would only kick in when the application is writing faster than the
> disk can keep up.  If you just have a few spinning disks, you can probably hit
> it with "dd if=/dev/zero of=/pool/bigfile bs=1024k".  Might need several
> instances of that (to different files) if you have lots of fast disks or few
> slow CPUs.

I've been running fio jobs with up to 128 threads and hitting the disks
pretty hard, but still don't see the delays kick in.  But it is a pretty
big, fast pool, so I'm probably just unable to saturate the disks.

I'll post some benchmark results soon which I hope will generate some
interesting discussions on performance.

>     The only exception to this is when I dynamically tune zfs_dirty_data_max
>     downward by a large amount.  In that case dirty is close enough to the
>     new max that min_tx_time is initially around 3s.  But then when it is
>     truncated to 100ms, wakeup gets a timestamp in the past.  And because I
>     implemented the kernel delay using msleep(), which takes a relative
>     time, my system hangs.
>
>     This raises another question I've been meaning to ask: why was the delay
>     implemented using cv_timedwait_hires()?  I'm not familiar with timer
>     APIs in Illumos, but it seems like this could have be done in a simpler
>     way that would be easier to emulate on other platforms, such as a simple
>     delay or sleep call.  Yes, we could rewrite it for Linux, but I'd prefer
>     to minimize differences in core code like this.
>
>
> What routine would you prefer be used?  There's no msleep() in illumos.

I was thinking "surely there must be a simple sleep() -like interface in
illumos", but I guess I was wrong. :)

>  Perhaps we should create a "sleep until this time" interface to wrap the
> (admittedly ugly) mutex + CV + timedwait().  Something to consider for the
> common codebase.

Yes, a clean wrapper would be nice, so we can hide the OS-specific bits.

Thanks,
Ned
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to