From:  Borislav Petkov
> Sent: 12 December 2018 18:45
...
> > The property I want for RDTSC ordering is much weaker: I want it to be
> > ordered like a load.  Imagine that, instead of an on-chip TSC, the TSC
> > is literally a location in main memory that gets incremented by an
> > extra dedicated CPU every nanosecond or so.  I want users of RDTSC to
> > work as if they were reading such a location in memory using an
> > ordinary load.  I believe this gives the real desired property that it
> > should be impossible to observe the TSC going backwards.  This is a
> > much weaker form of serialization.
> 
> Well, in that case you need something new.
> 
> Because, the moment you have a RDTSC in flight and a second RDTSC comes
> in and that second RDTSC must *not* bypass the first one and execute
> earlier due to OoO, you need to impose some ordering. And that's pretty
> much uarch-dependent, I'd say.
> 
> And I guess on AMD the way to do that is to stop dispatch until the
> first RDTSC retires.
> 
> Can it be done faster? Sure. And I'm pretty sure there's a lot of pesky
> little hw details we're not even hearing of, which get in the way.

ISTR one of the problems with RDTSC serialising is that it is used
for micro-benchmarks.
So you want to time all the instructions between a pair of RDTSC.
This doesn't work well if RDTSC doesn't wait for all instructions
to have executed.
The serialisation requirements for spectre mitigation are different.

        David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Reply via email to