Hi Mike,

Yes, I can quantify the cost. Is it very high.

Here is the patch that I used:

--- rtl/tsan_rtl.cc (revision 226644)
+++ rtl/tsan_rtl.cc (working copy)
@@ -709,7 +709,11 @@
 ALWAYS_INLINE USED
 void MemoryAccess(ThreadState *thr, uptr pc, uptr addr,
     int kAccessSizeLog, bool kAccessIsWrite, bool kIsAtomic) {
   u64 *shadow_mem = (u64*)MemToShadow(addr);
+
+  atomic_fetch_add((atomic_uint64_t*)shadow_mem, 0, memory_order_acq_rel);
+

On the standard tsan benchmark that does 8-byte writes:
before:
[       OK ] DISABLED_BENCH.Mop8Write (1161 ms)
after:
[       OK ] DISABLED_BENCH.Mop8Write (5085 ms)

So that's 338% slowdown.





On Mon, Jan 19, 2015 at 6:12 PM, Mike Stump <mikest...@comcast.net> wrote:
> On Jan 19, 2015, at 12:47 AM, Dmitry Vyukov <dvyu...@google.com> wrote:
>> Long story short. Tsan has a logical data race the core of data race
>> detection algorithm. The race is not a bug, but a deliberate design
>> decision that makes tsan considerably faster.
>
> Could you please quantify that for us?  Also, what lockless update method did 
> you use?  Did you try atomic increment?  On my machine, they are as cheap as 
> stores; can’t imagine they could be slow at all.  If the latency and 
> bandwidth of atomic increment is identical to store, would the costs be any 
> higher than using a store to update the tsan data?  A proper port of tsan to 
> my machine would make use of atomic increment.  I consider it a simple matter 
> to sequence the thread termination and the output routine to ensure that all 
> the updates in the threads happen before the output routine runs.  The output 
> routine strikes me as slow, and thread termination strikes me as slow, so 
> taking a little extra time there seems reasonable.  Was the excessive cost 
> you saw due to the termination costs?
>
>> So ironically, if the race memory accesses happen almost simultaneously, 
>> tsan can miss the
>> race.
>> Thus we have sleeps.
>
> I’ve not seen a reason why the test suite should randomly fail.  The gcc test 
> suite does not.  Could you explain why the llvm test suite does?  Surely you 
> know that sleep is not a synchronization primitive?
>
>> Sleeps vs barrier is being discussed in the "Fix parameters of
>> __tsan_vptr_update" thread.
>
> When finished, let us know the outcome.  To date, I’ve not seen any 
> compelling reason to have the core of tsan be other than deterministic and 
> the test suite other than deterministic.  I’d love to see the backing for 
> such a decision.
>
>> I would really like to keep llvm and gcc tests in sync as much as possible.
>
> Excellent, from my perspective, that would mean that you make the llvm test 
> suite deterministic.

Reply via email to