On Thu, Oct 10, 2013 at 02:12:38PM -0700, Matthew Ahrens wrote: > > In my testing I observe that dmu_tx_delay() _always_ returns here > under normal conditions: > > 1034 if (now > tx->tx_start + min_tx_time) > 1035 return; > > I haven't yet explained why this is the case. It must either be that > dirty is staying very close to dirty_delay_min_bytes, or some overhead > higher up in the call path has already incurred enough delay. > > > The delay would only kick in when the application is writing faster than the > disk can keep up. If you just have a few spinning disks, you can probably hit > it with "dd if=/dev/zero of=/pool/bigfile bs=1024k". Might need several > instances of that (to different files) if you have lots of fast disks or few > slow CPUs.
I've been running fio jobs with up to 128 threads and hitting the disks pretty hard, but still don't see the delays kick in. But it is a pretty big, fast pool, so I'm probably just unable to saturate the disks. I'll post some benchmark results soon which I hope will generate some interesting discussions on performance. > The only exception to this is when I dynamically tune zfs_dirty_data_max > downward by a large amount. In that case dirty is close enough to the > new max that min_tx_time is initially around 3s. But then when it is > truncated to 100ms, wakeup gets a timestamp in the past. And because I > implemented the kernel delay using msleep(), which takes a relative > time, my system hangs. > > This raises another question I've been meaning to ask: why was the delay > implemented using cv_timedwait_hires()? I'm not familiar with timer > APIs in Illumos, but it seems like this could have be done in a simpler > way that would be easier to emulate on other platforms, such as a simple > delay or sleep call. Yes, we could rewrite it for Linux, but I'd prefer > to minimize differences in core code like this. > > > What routine would you prefer be used? There's no msleep() in illumos. I was thinking "surely there must be a simple sleep() -like interface in illumos", but I guess I was wrong. :) > Perhaps we should create a "sleep until this time" interface to wrap the > (admittedly ugly) mutex + CV + timedwait(). Something to consider for the > common codebase. Yes, a clean wrapper would be nice, so we can hide the OS-specific bits. Thanks, Ned _______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
