On 2/16/12 5:56 PM, David Xu wrote:
On 2012/2/17 8:42, Julian Elischer wrote:
Adding David Xu for his thoughts since he reqrote the code in quesiton in revision 213098

On 2/16/12 2:57 PM, Julian Elischer wrote:
On 2/16/12 1:06 PM, Julian Elischer wrote:
On 2/16/12 9:34 AM, Andriy Gapon wrote:
on 15/02/2012 23:41 Julian Elischer said the following:
The program fio (an IO test in ports) uses pthreads

the following code (from fio-2.0.3, but its in earlier code too)
has suddenly started misbehaving.

         clock_gettime(CLOCK_REALTIME,&t);
         t.tv_sec += seconds + 10;

         pthread_mutex_lock(&mutex->lock);

         while (!mutex->value&&  !ret) {
                 mutex->waiters++;
ret = pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
                 mutex->waiters--;
         }

         if (!ret) {
                 mutex->value--;
                 pthread_mutex_unlock(&mutex->lock);
         }


It turns out that 'ret' sometimes comes back instantly (on my machine) with a
value of 60 (ETIMEDOUT)
despite the fact that we set the timeout 10 seconds into the future.

Has anyone else seen anything like this?
(and yes the condition variable attribute have been set to use the REALTIME clock).
But why?

Just a hypothesis that maybe there is some issue with time keeping on that system.
How would that code work out for you with MONOTONIC?

Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, and they both had the same problem..
i.e. random early returns with ETIMEDOUT.

I think we will try move out machine forward to a newer -stable to see if it resolves.
Kan upgraded the machine today to today's 9.x branch tip and the problem still occurs.
8.x does not have this problem.

I have not got a 9-RELEASE machine to test on.. so I can not tell if this came in with the burst of stuff that came in after the 9.x branch was unfrozen after the release of 9.0.



I am trying to reproduce the problem, do you have complete sample code to test ?

I'm still looking the exact set
but on my machine (4 cpus) the program from ports sysutils/fio exhibits the problem when used with
kern.timecounter.hardware=TSC-low and with the following config file:

pu05 # cat config.fio

[global]
#clocksource=cpu
direct=1
rw=randread
bs=4096
fill_device=1
numjobs=16
iodepth=16
#ioengine=posixaio
#ioengine=psync
ioengine=psync
group_reporting
norandommap
time_based
runtime=60000
randrepeat=0

[file1]
filename=/dev/ada0

pu05 #
pu05 # fio config.fio
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
...
file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
fio 2.0.3
Starting 15 threads and 1 process
fio: job startup hung? exiting.
fio: 5 jobs failed to start
Segmentation fault (core dumped)
pu05#


The reason 5 jobs failed to start is because the parent timed out on them immediately.
It didn't time out on 10 of them apparently.


if I set the timer to ACPI-fast it works as expected..

Regards,
David Xu



_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to