Hi Mike, Yes, I can quantify the cost. Is it very high.
Here is the patch that I used: --- rtl/tsan_rtl.cc (revision 226644) +++ rtl/tsan_rtl.cc (working copy) @@ -709,7 +709,11 @@ ALWAYS_INLINE USED void MemoryAccess(ThreadState *thr, uptr pc, uptr addr, int kAccessSizeLog, bool kAccessIsWrite, bool kIsAtomic) { u64 *shadow_mem = (u64*)MemToShadow(addr); + + atomic_fetch_add((atomic_uint64_t*)shadow_mem, 0, memory_order_acq_rel); + On the standard tsan benchmark that does 8-byte writes: before: [ OK ] DISABLED_BENCH.Mop8Write (1161 ms) after: [ OK ] DISABLED_BENCH.Mop8Write (5085 ms) So that's 338% slowdown. On Mon, Jan 19, 2015 at 6:12 PM, Mike Stump <mikest...@comcast.net> wrote: > On Jan 19, 2015, at 12:47 AM, Dmitry Vyukov <dvyu...@google.com> wrote: >> Long story short. Tsan has a logical data race the core of data race >> detection algorithm. The race is not a bug, but a deliberate design >> decision that makes tsan considerably faster. > > Could you please quantify that for us? Also, what lockless update method did > you use? Did you try atomic increment? On my machine, they are as cheap as > stores; can’t imagine they could be slow at all. If the latency and > bandwidth of atomic increment is identical to store, would the costs be any > higher than using a store to update the tsan data? A proper port of tsan to > my machine would make use of atomic increment. I consider it a simple matter > to sequence the thread termination and the output routine to ensure that all > the updates in the threads happen before the output routine runs. The output > routine strikes me as slow, and thread termination strikes me as slow, so > taking a little extra time there seems reasonable. Was the excessive cost > you saw due to the termination costs? > >> So ironically, if the race memory accesses happen almost simultaneously, >> tsan can miss the >> race. >> Thus we have sleeps. > > I’ve not seen a reason why the test suite should randomly fail. The gcc test > suite does not. Could you explain why the llvm test suite does? Surely you > know that sleep is not a synchronization primitive? > >> Sleeps vs barrier is being discussed in the "Fix parameters of >> __tsan_vptr_update" thread. > > When finished, let us know the outcome. To date, I’ve not seen any > compelling reason to have the core of tsan be other than deterministic and > the test suite other than deterministic. I’d love to see the backing for > such a decision. > >> I would really like to keep llvm and gcc tests in sync as much as possible. > > Excellent, from my perspective, that would mean that you make the llvm test > suite deterministic.