On Thu, Oct 10, 2013 at 2:51 PM, Ned Bass <[email protected]> wrote:

> On Thu, Oct 10, 2013 at 02:12:38PM -0700, Matthew Ahrens wrote:
> >
> >     In my testing I observe that dmu_tx_delay() _always_ returns here
> >     under normal conditions:
> >
> >         1034         if (now > tx->tx_start + min_tx_time)
> >         1035                 return;
> >
> >     I haven't yet explained why this is the case.  It must either be that
> >     dirty is staying very close to dirty_delay_min_bytes, or some
> overhead
> >     higher up in the call path has already incurred enough delay.
> >
> >
> > The delay would only kick in when the application is writing faster than
> the
> > disk can keep up.  If you just have a few spinning disks, you can
> probably hit
> > it with "dd if=/dev/zero of=/pool/bigfile bs=1024k".  Might need several
> > instances of that (to different files) if you have lots of fast disks or
> few
> > slow CPUs.
>
> I've been running fio jobs with up to 128 threads and hitting the disks
> pretty hard, but still don't see the delays kick in.  But it is a pretty
> big, fast pool, so I'm probably just unable to saturate the disks.
>
> I'll post some benchmark results soon which I hope will generate some
> interesting discussions on performance.
>
> >     The only exception to this is when I dynamically tune
> zfs_dirty_data_max
> >     downward by a large amount.  In that case dirty is close enough to
> the
> >     new max that min_tx_time is initially around 3s.  But then when it is
> >     truncated to 100ms, wakeup gets a timestamp in the past.  And
> because I
> >     implemented the kernel delay using msleep(), which takes a relative
> >     time, my system hangs.
> >
> >     This raises another question I've been meaning to ask: why was the
> delay
> >     implemented using cv_timedwait_hires()?  I'm not familiar with timer
> >     APIs in Illumos, but it seems like this could have be done in a
> simpler
> >     way that would be easier to emulate on other platforms, such as a
> simple
> >     delay or sleep call.  Yes, we could rewrite it for Linux, but I'd
> prefer
> >     to minimize differences in core code like this.
> >
> >
> > What routine would you prefer be used?  There's no msleep() in illumos.
>
> I was thinking "surely there must be a simple sleep() -like interface in
> illumos", but I guess I was wrong. :)
>

Sure, there's delay(9f) <http://illumos.org/man/9f/delay>, but like
sleep(3c), the granularity is way too coarse.

--matt


>
> >  Perhaps we should create a "sleep until this time" interface to wrap the
> > (admittedly ugly) mutex + CV + timedwait().  Something to consider for
> the
> > common codebase.
>
> Yes, a clean wrapper would be nice, so we can hide the OS-specific bits.
>
> Thanks,
> Ned
>
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to