[Developer] dmu_tx_delay() may calculate wakeup time in the past

Ned Bass Mon, 07 Oct 2013 19:06:51 -0700

Here's something I noticed while evaluating the "4045 zfs write throttle
& i/o scheduler performance work" patch on Linux.


In dmu_tx_delay, the amount of time to delay a transaction is calculated
like this:

1031         now = gethrtime();
1032         min_tx_time = zfs_delay_scale *
1033             (dirty - delay_min_bytes) / (zfs_dirty_data_max - dirty);
1034         if (now > tx->tx_start + min_tx_time)
1035                 return;
1036
1037         min_tx_time = MIN(min_tx_time, zfs_delay_max_ns);
1038
1039         DTRACE_PROBE3(delay__mintime, dmu_tx_t *, tx, uint64_t, dirty,
1040             uint64_t, min_tx_time);
1041
1042         mutex_enter(&dp->dp_lock);
1043         wakeup = MAX(tx->tx_start + min_tx_time,
1044             dp->dp_last_wakeup + min_tx_time);
1045         dp->dp_last_wakeup = wakeup;
1046         mutex_exit(&dp->dp_lock);
1047
1048 #ifdef _KERNEL
1049         mutex_enter(&curthread->t_delay_lock);
1050         while (cv_timedwait_hires(&curthread->t_delay_cv,
1051             &curthread->t_delay_lock, wakeup, zfs_delay_resolution_ns,
1052             CALLOUT_FLAG_ABSOLUTE | CALLOUT_FLAG_ROUNDUP) > 0)
1053                 continue;
1054         mutex_exit(&curthread->t_delay_lock);
1055 #else
1056         hrtime_t delta = wakeup - gethrtime();
1057         struct timespec ts;
1058         ts.tv_sec = delta / NANOSEC;
1059         ts.tv_nsec = delta % NANOSEC;
1060         (void) nanosleep(&ts, NULL);
1061 #endif

One line 1035 we return if the tx has already been delayed long enough,
which would normally prevent wakeup from getting a time in the past on
line 1043.  However, line 1037 may lower the value of min_tx_time enough
to cause a past wakeup time after all.  This may not cause a problem in
the kernel code since it uses an absolute time.  But the userspace code
calculates a relative delay time which would become negative in this
case, and may be interpreted as a really large positive value by
nanosleep().

So, shouldn't line 1037 be moved above line 1034?

Thanks,
Ned
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

[Developer] dmu_tx_delay() may calculate wakeup time in the past

Reply via email to