Sent from my iPad

On May 1, 2019, at 1:38 PM, dor laor 
<[email protected]<mailto:[email protected]>> wrote:

On Wed, May 1, 2019 at 9:58 AM Gil Tene <[email protected]<mailto:[email protected]>> 
wrote:
There are many ways for RDTSC to be made "wrong" (as in non-monotonic within a 
software thread, process, system, etc.) on systems, but AFAIK "most" modern 
x86-64 bare metal systems can be set up for good clean, monotonic system-wide 
TSC-ness. The hardware certainly has the ability to keep those TSCs in sync 
(enough to not have detectable non-sync effects) both within a socket and 
across multi-socket systems (when the hardware is "built right"). The TSCs all 
get reset together and move together unless interfered with...

Two ways I've seen this go wrong even on modern hardware include:

A) Some BIOSes resetting TSC on a single core or hyperthread on each socket 
(usually thread 0 of core 0) for some strange reason during the boot sequence. 
[I've conclusively shown this on some 4 socket Sandy Bridge systems.] This 
leads different vcores to have vastly differing TSC values, which gets bigger 
with every non-power-cycling reboot, with obvious negative effects and screams 
from anyone relying on TSC consistency for virtually any purpose.

B) Hypervisors virtualizing TSC. Some hypervisors (notably at least some 
versions of VMWare) will virtualize the TSC and "slew" the virtualized value to 
avoid presenting guest OSs with huge jumps in TSC values when a core was taken 
away for a "long" (i.e. many-msec) period of time. Instead, the virtualized TSC 
will incrementally move forward in small jumps until it catches up. The purpose 
of this appears to be to avoid triggering guest OS panics in code that watches 
TSC for panic-timeouts and other sanity checks (e.g. code in OS spinlocks). The 
effect of this "slewing" is obvious: TSWC values can easily jump backward, even 
within a single software thread.

A hypervisor wouldn't take the TSC backwards, it can slow the TSC but not take 
it backward, unless they virtualize the cpu bits for stable tsc differently 
which
happens but I doubt VMware (and better hypervisors) take the TSC back

A hypervisor wouldn't take the TSC backwards within one vcore.

But vcores are scheduled individually, which means that any slewing done to 
hide a long jump forward in the physical TSC in situations where a vcore was 
not actually running on a physical core for a “long enough” period of time is 
done individually within each vcore and its virtualized TSC. (synchronizing the 
virtualized TSC slewing across vcores would require either synchronizing their 
scheduling such that the entire VM would be either “on” or “off” cores at the 
same time, or making the virtualuzed a TSC only tick forward in large 
quantum’s, or only when all vcores are actively running on physical cores,  all 
of which would cause some other dramatic strangeness).

Multiple vcores belonging to the same guest OS can (and usually will) end up 
running simultaneously on multiple real cores, which obviously means that 
during slewing periods they will be showing vastly differing virtualized TSC 
values (with gaps of 10s of msec) until the “slewing” is done. All it takes is 
a “lucky timing” context switch within the Guest OS, moving a thread from one 
vcore to another (for whichever of the many reasons the guest OS might decide 
to do that) for *your* program to observe the TSC “jumping backwards” by 10s of 
msec between one RDTSC execution and another.



The bottom line is that TSC can be relied on bare metal (where there is no 
hypervisor scheduling of guest OS cores) if the system is set up right, but can 
do very wrong things otherwise. People who really care about low cost time 
measurement (like System.nanotime()) can control their systems to make this 
work and elect to rely on it (that's exactly what Zing's -XX:+UseRdtsc flag is 
for), but it can be dangerous to rely on it by default.

On Tuesday, April 30, 2019 at 3:07:11 AM UTC-7, Ben Evans wrote:
I'd assumed that the monotonicity of System.nanoTime() on modern
systems was due to the OS compensating, rather than any changes at the
hardware level. Is that not the case?

In particular, Rust definitely still seems to think that their
SystemTime (which looks to back directly on to a RDTSC) can be
non-monotonic: https://doc.rust-lang.org/std/time/struct.SystemTime.html

On Tue, 30 Apr 2019 at 07:50, dor laor <[email protected]> wrote:
>
> It might be since in the past many systems did not have a stable rdtsc and 
> thus if the instruction is executed
> on different sockets it can result in wrong answers and negative time. Today 
> most systems do have a stable tsc
> and you can verify it from userspace/java too.
> I bet it's easy to google the reason
>
> On Mon, Apr 29, 2019 at 2:36 PM 'Carl Mastrangelo' via mechanical-sympathy 
> <[email protected]> wrote:
>>
>> This may be a dumb question, but why (on Linux) is System.nanotime() a call 
>> out to clock_gettime?    It seems like it could be inlined by the JVM, and 
>> stripped down to the rdtsc instruction.   From my reading of the vDSO source 
>> for x86, the implementation is not that complex, and could be copied into 
>> Java.
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "mechanical-sympathy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google 
Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit 
https://groups.google.com/d/topic/mechanical-sympathy/7WnH37dA6Yc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 
[email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to