On Tuesday, 07/26/2016 at 07:21 GMT, Martin Schwidefsky 
<[email protected]> wrote:
>
> > The problem today is that VM will not generate a sync check when the
> > difference between NTP and the TOD becomes large enough that it cannot 
be
> > steered out in a short period of time, such as when the external time
> > source is reconnected or when a leap second is added.
>
> STP does not generate a sync check for leap seconds, they are not 
included in
> the TOD clock. The programming notes in the PoP about the Time-of-Day 
Clock
> you will find this:
>
> "In converting to or from the current date or time, the programming 
support
> must take into account that leap seconds have been inserted or deleted
> because of time-correction standards. When the TOD clock has been set
> correctly to a time within the standard epoch, the sum of the 
accumulated
> leap seconds must be subtracted from the clock time to determine UTC 
time."

... and we're off to the races.  :-)

You're right, I got it backwards.  I seem to do so every time I come back 
to this after several months.  I also didn't mean 'sync check'.  I meant 
the more generic "STP notification."  :-)

Time is a complicated subject.  Just ask Stephen Hawking.  Ignoring the 
effects of gravity, there are still many moving parts needed to figure out 
what time it is.
- The common frame of reference for the TOD clock
- TOD synchronization with NTP
- The ability of the OS and its apps to accurately calculate "now".

Here's what STP does: 
1) Based on NTP, calculate what the proper TOD clock value should be.
2) Steer the machines in the STP timing network to that value.

That's it - it's that simple.  Well, it's simple to say, but much harder 
to do.

For the purists in the room (I am one of them), the TOD clock needs to be 
set as defined in the Principles of Operations.  That means using the 
Standard Epoch, where a TOD clock value of zero represents 0h 1/1/1900. 
And that means STP needs to know about leap seconds!

If you leave out the leap seconds, then every leap second that is 
introduced moves the start of the TOD clock epoch backward one second.  In 
such a system, if I looked at your TOD clock value now and started 
counting down to zero, it would be 23:59:34 Dec 31, 1899.

Is that a sin?  No.  But it complicates things when you want to compare 
TOD clock values in multiple systems to determine the order of events! All 
systems within a System of Systems need to agree on "What time WAS it?" as 
much as "What time IS it?" Such information may be needed for forensics 
and failure to have a proper time stamp could bring the validity of any 
evidence into question.

And while your lawyer can help you squirm your way out of the quagmire 
between leap seconds, the introduction of the next one is the fatal blow.

Here's how it works:  STP is watching the NTP stream.  It calculates the 
desired TOD clock value based on UTC and adjusts the TOD clock to match. 
As long as two values are within 50us, we are happy.  The sun is shining, 
a gentle breeze is blowing, the birds are singing, and the mosquitoes 
aren't biting.  Our picnic is going famously!

Then there is a thunderclap and bolt of lighting.  While the TOD clock has 
been steadily incrementing, the leap second arrives and stops time.  But 
it doesn't stop the TOD clock.  As said above, STP looks at UTC calculates 
the desired TOD and compares it to the actual TOD.  The actual TOD is one 
second ahead of where it should be!

        +-------+
        | EGAD! |
        +---++--+
            ||
            //
            \\

That's a number WAY larger than 50us.  The tornady's a-comin', run for 
cover!  The next thing you know, there's blowing dirt in your eyes, the 
ants are crawling on your food, and you have big welts on your body from 
the mosquitoes!  STP flies into action: "Don't worry Tod, I'll save you!"

STP notifies all LPARs that have registered for the signal that the TOD 
clock has lost synchronization.  At the same time STP starts slowing the 
TOD clock.  In about 7 hours, the TOD clock will be back within 50us of 
UTC and the registered LPARs will be notified that the clock is now back 
in sync.  Calm returns, the dust settles, and the TOD clock is trustworthy 
once again.

But silently, unobserved in the shadows, the TOD clock epoch has now moved 
one second further back into 1899.  In a word, "Eeeewwww!"

Any TOD clock values from before the leap second that are still being used 
(e.g. in timers) are no longer valid.  Maybe being off by a second is ok, 
maybe it isn't.  How do I know?  I worry more about legal and forensic 
issues than the practical aspects of the difference.

When you include the leap second data, particularly the schedule for the 
next one, the calculation of the desired TOD compensates, resulting a 
value consistent with the Standard Epoch.  That means it's going to be 
very close to the actual TOD, and can remain in sync.  There is no 
one-second difference.  Yaaay!

STP also notifies the registered LPARs that there has been a change in the 
leap second specification.  The PTFF-QUERY UTC INFORMATION (QUI) 
instruction can be used to get the old and new leap second value.  That 
means the host is able to properly compute UTC from the TOD clock.

WAKE UP!  I'm not done.

Since 1972, it has not been possible to use the TOD clock on the Standard 
Epoch to calculate the correct date and time without knowledge of leap 
seconds.  True, no one cared and did it anyway.  Why?  Because we didn't 
use the Standard Epoch.  We used the system operator's watch.  It was good 
enough.  "The Timex Epoch"?  LOL!  Though I have to admit, as a 
13-year-old, all I cared about was that STAR TREK was on at 4:30.  One 
second either way wasn't an issue for me.  Besides, every month my dad 
listened to WWV and adjusted the clocks!  "Chuckie!  Bring me your watch!" 
 (Even then, he KNEW.)

But over the decades new technologies and governance requirements have 
demonstrated that a casual approach to time isn't good enough any more. 
ETRs were introduced and leap seconds were mainstream.  Imagine the 
conversation at cocktail parties! (yawn)

And so, finally, let's talk about figuring out what time it is.  Programs 
should ASK THE OS.  If you use STORE CLOCK or STORE CLOCK EXTENDED, all 
you're getting is the TOD clock and as we've discussed, that's not good 
enough.  Not only do you need leap seconds, but you need time zone and DST 
data.  But if you just want a TOD clock value as a sequencing widget, then 
by all means, use it.

This implies that the OS supports enough of STP to give you the local (or 
UTC) time.  Heads up!  z/VM doesn't.  This is why CP QUERY TIME returns a 
value that is 26 seconds ahead of UTC.  A virtual machine that issue 
PTFF-QUI can pick up the leap second value. 

Here's what my program gets from PTFF-QUI:
  Physical TOD         (pb.Tu) = D11C59EA 78E40779 
  Steering offset      (pb.d ) = 00000000 E1C7383B 
  + TOD epoch diff     (pb.ed) = 00000000 00000000 
  = Logical TOD offset (pb.dl) = 00000000 E1C7383B 
  Mode   0:Local 1:ETR 2:STP   =                 2 
  Synchronized?  0:No  1:Yes   =                 1 
  Next leap second is at       = 00000000 00000000 
  Leap seconds before event    =                 0 
  Leap seconds after event     =                 0 
  CST Reference time           = D11C59EB 5014D65A 
  CST-TOD dispersion           = 00000000 00000000 
  CST offset from CTS leader   = 00000000 00000000

I can see that I'm using STP, this machine is the CTS (if it's in a 
network), the clock is in sync, and there are no leap seconds configured. 
Sure enough, CP QUERY TIME is telling me the CORRECT time, but the TOD 
clock is not on the Standard Epoch.  I know this because the leap seconds 
are zero.  If the leap seconds were properly set to 26, CP QUERY TIME 
would be 26 seconds fast.

Given that CMS gets its "time of day" information from CP, it, too, will 
be only as correct as CP.

In addition to PTFF-QUI, guests that use the TOD clock to calculate the 
local time need to use DIAGNOSE 0x00 to get the TZ offset and register via 
DIAGNOSE 0x274 to find out when the timezone changes (e.g. enter/exit 
DST).  If you get the interrupt, you issue DIAG 0x00 again to get the 
updated TZ offset.

I know there was a lot of gorp here, but I hope that it explains better 
How It Works and why you see the things you see.  I will eventually 
incorporate this information into my vmleap.html article.

Alan Altmark

Senior Managing z/VM and Linux Consultant
Lab Services System z Delivery Practice
IBM Systems & Technology Group
ibm.com/systems/services/labservices
office: 607.429.3323
mobile; 607.321.7556
[email protected]
IBM Endicott

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

Reply via email to