On 1/31/14, Andrej Mitrovic <[email protected]> wrote: > Hmm yeah, but I was expecting better numbers. Even after the 'static' > fix in the bug as noted by Stanislav the atomic version is slower.
Actually, I think I understand why this happens. Logically, the atomic version will do an atomic read for *every* access, whereas the TLS implementation only checks a thread-local boolean flag. Even though the TLS implementation forces each new thread to enter the synchronized block *on the first read for that thread*, on subsequent reads that thread will not enter the synchronized block anymore. After the very first call of every thread, the cost of the read operation for the TLS version is a TLS read, whereas for the atomic version it is an atomic read. I guess TLS read operations simply beat atomic read operations. The atomic implementation probably beats the TLS version when a lot of new threads are being spawned at once and they only retrieve the singleton which has already been initialized. E.g., say a 1000 threads are spawned. In the atomic version, the 1000 threads will all do an atomic read and not enter the synchronized block, whereas in the TLS version the 1000 threads will all need to enter a synchronized block on the very first read.
