On 2/16/12 5:56 PM, David Xu wrote:
On 2012/2/17 8:42, Julian Elischer wrote:
Adding David Xu for his thoughts since he reqrote the code in
quesiton in revision 213098
On 2/16/12 2:57 PM, Julian Elischer wrote:
On 2/16/12 1:06 PM, Julian Elischer wrote:
On 2/16/12 9:34 AM, Andriy Gapon wrote:
on 15/02/2012 23:41 Julian Elischer said the following:
The program fio (an IO test in ports) uses pthreads
the following code (from fio-2.0.3, but its in earlier code too)
has suddenly started misbehaving.
clock_gettime(CLOCK_REALTIME,&t);
t.tv_sec += seconds + 10;
pthread_mutex_lock(&mutex->lock);
while (!mutex->value&& !ret) {
mutex->waiters++;
ret =
pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
mutex->waiters--;
}
if (!ret) {
mutex->value--;
pthread_mutex_unlock(&mutex->lock);
}
It turns out that 'ret' sometimes comes back instantly (on my
machine) with a
value of 60 (ETIMEDOUT)
despite the fact that we set the timeout 10 seconds into the
future.
Has anyone else seen anything like this?
(and yes the condition variable attribute have been set to use
the REALTIME clock).
But why?
Just a hypothesis that maybe there is some issue with time
keeping on that system.
How would that code work out for you with MONOTONIC?
Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC,
and they both had the same problem..
i.e. random early returns with ETIMEDOUT.
I think we will try move out machine forward to a newer -stable
to see if it resolves.
Kan upgraded the machine today to today's 9.x branch tip and the
problem still occurs.
8.x does not have this problem.
I have not got a 9-RELEASE machine to test on.. so I can not tell
if this came in with the burst of stuff
that came in after the 9.x branch was unfrozen after the release
of 9.0.
I am trying to reproduce the problem, do you have complete sample
code to test ?
I'm still looking the exact set
but on my machine (4 cpus) the program from ports sysutils/fio
exhibits the problem when used with
kern.timecounter.hardware=TSC-low and with the following config file:
pu05 # cat config.fio
[global]
#clocksource=cpu
direct=1
rw=randread
bs=4096
fill_device=1
numjobs=16
iodepth=16
#ioengine=posixaio
#ioengine=psync
ioengine=psync
group_reporting
norandommap
time_based
runtime=60000
randrepeat=0
[file1]
filename=/dev/ada0
pu05 #
pu05 # fio config.fio
fio: this platform does not support process shared mutexes, forcing
use of threads. Use the 'thread' option to get rid of this warning.
file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
...
file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
fio 2.0.3
Starting 15 threads and 1 process
fio: job startup hung? exiting.
fio: 5 jobs failed to start
Segmentation fault (core dumped)
pu05#
The reason 5 jobs failed to start is because the parent timed out on
them immediately.
It didn't time out on 10 of them apparently.
if I set the timer to ACPI-fast it works as expected..
Regards,
David Xu
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"