On Wed, Apr 25, 2018 at 4:52 AM Aleksey Shipilev <aleksey.shipi...@gmail.com>
wrote:

> On 04/24/2018 10:44 PM, John Hening wrote:
> > I'm reading the great article from
> https://shipilev.net/blog/2014/nanotrusting-nanotime/ (thanks
> > Aleksey! :)) and I am not sure whether I understand correctly that.
> >
> > Firstly, it is compared performance of plain and volatile writes:
> >
> > BenchmarkModeSamplesMeanMeanerror Units
> > o.s.VolatileWriteSucks.incrPlain avgt 2503.5890.025ns/op
> > o.s.VolatileWriteSucks.incrVolatile avgt 25015.2190.114ns/op
> >
> > and then it is written that:
> >
> > "In real code, the heavy-weight operations are mixed with relatively
> low-weight ops, which
> > amortize the costs."
> >
> > And my question is: What does it mean to amortize costs exactly? I
> explain it myself that
> > amortization is caused by out of order execution of CPU, yes? So even if
> volatile write takes
> > much more time than plain write, it isn't so painful because CPU
> executes other instruction out
> > of order (if it can).
> >
> > What do you think?
> Yes, that's basically the gist of it: volatile writes can be heavy,
> especially when contended
> (although contention is the first-order effect there, and non-volatile
> writes would suck as much),
> but in real cases they mostly aren't.
>
> Amortizing would happen even for in-order CPUs: you can have N arithmetic
> ops executing on sub-cycle
> speed, and then occasional speed bump with a memory barrier that takes
> tens/hundreds of cycles. The
> larger the N, the higher the average execution speed. Obviously, it gets
> better with out-of-order
> CPUs, but that is not a requirement.

The way I see it, in-order/out-of-order doesn’t matter here and just
muddies the water.

Volatile writes kill speculation/OoO - that’s arguably their biggest cost
(+ compiler optimization barrier, particularly for loops) over plain stores
to shared locations.  So there’s no OoO execution across them.

I think the “amortization” can be viewed simply as (which you do mention):
1) If entire processing consists of volatile write and it takes, say, 15ns,
then it consumes 100% of all processing.
2) If there’s another 15ns of processing involved, then it’s 50% of overall
execution.
3) If there’s 85ns of additional work, it’s now 15%.
And so on.

>
>
> It was supposed to protect readers from assuming they should avoid
> volatile writes, because they are
> "obviously" slow (hey look, 10x degradation!). While in reality, it
> matters mostly on very optimized
> fast-paths, and probably only the interest of performance fiends^W people
> subscribed to this list :)
>
> Thanks,
> -Aleksey
>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mechanical-sympathy+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
-- 
Sent from my phone

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to