Santiago Vila wrote: > Also, how do we determine that using 3 seconds instead of 2 is a good fix? > Is that just the smallest increment that made your 9 in 100 failure > rate to become unmeasurable? (I guess 0 after trying a lot of times)
Yes, with a sleep of 3 seconds I got 0 failures in 100 runs. One could try sleeps of 2.1, 2.2, 2.3, ..., 2.9 seconds and see how the failure probability declines. But I haven't done that. > Am I right to think that this is not really an architecture-specific > problem but more like a consequence of the machine being "slow"? It's not that the machine is slow: On my QEMU-emulated Linux/SPARC machine (Debian 9), I got 0 failures in 100 runs even before the patch, and that is certainly slower than SPARC hardware. I guess it's related to how fast the Linux scheduler wakes up the pthread_cond_timedwait_routine thread, and that may be dependent on details of the hardware (such as, the number of jiffies per second). Bruno
