On Wed, Apr 25, 2018 at 4:52 AM Aleksey Shipilev <aleksey.shipi...@gmail.com> wrote:
> On 04/24/2018 10:44 PM, John Hening wrote: > > I'm reading the great article from > https://shipilev.net/blog/2014/nanotrusting-nanotime/ (thanks > > Aleksey! :)) and I am not sure whether I understand correctly that. > > > > Firstly, it is compared performance of plain and volatile writes: > > > > BenchmarkModeSamplesMeanMeanerror Units > > o.s.VolatileWriteSucks.incrPlain avgt 2503.5890.025ns/op > > o.s.VolatileWriteSucks.incrVolatile avgt 25015.2190.114ns/op > > > > and then it is written that: > > > > "In real code, the heavy-weight operations are mixed with relatively > low-weight ops, which > > amortize the costs." > > > > And my question is: What does it mean to amortize costs exactly? I > explain it myself that > > amortization is caused by out of order execution of CPU, yes? So even if > volatile write takes > > much more time than plain write, it isn't so painful because CPU > executes other instruction out > > of order (if it can). > > > > What do you think? > Yes, that's basically the gist of it: volatile writes can be heavy, > especially when contended > (although contention is the first-order effect there, and non-volatile > writes would suck as much), > but in real cases they mostly aren't. > > Amortizing would happen even for in-order CPUs: you can have N arithmetic > ops executing on sub-cycle > speed, and then occasional speed bump with a memory barrier that takes > tens/hundreds of cycles. The > larger the N, the higher the average execution speed. Obviously, it gets > better with out-of-order > CPUs, but that is not a requirement. The way I see it, in-order/out-of-order doesn’t matter here and just muddies the water. Volatile writes kill speculation/OoO - that’s arguably their biggest cost (+ compiler optimization barrier, particularly for loops) over plain stores to shared locations. So there’s no OoO execution across them. I think the “amortization” can be viewed simply as (which you do mention): 1) If entire processing consists of volatile write and it takes, say, 15ns, then it consumes 100% of all processing. 2) If there’s another 15ns of processing involved, then it’s 50% of overall execution. 3) If there’s 85ns of additional work, it’s now 15%. And so on. > > > It was supposed to protect readers from assuming they should avoid > volatile writes, because they are > "obviously" slow (hey look, 10x degradation!). While in reality, it > matters mostly on very optimized > fast-paths, and probably only the interest of performance fiends^W people > subscribed to this list :) > > Thanks, > -Aleksey > > -- > You received this message because you are subscribed to the Google Groups > "mechanical-sympathy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to mechanical-sympathy+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- Sent from my phone -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.