On Tuesday, 07/26/2016 at 07:21 GMT, Martin Schwidefsky
<[email protected]> wrote:
The problem today is that VM will not generate a sync check when the
difference between NTP and the TOD becomes large enough that it cannot
be
steered out in a short period of time, such as when the external time
source is reconnected or when a leap second is added.
STP does not generate a sync check for leap seconds, they are not
included in
the TOD clock. The programming notes in the PoP about the Time-of-Day
Clock
you will find this:
"In converting to or from the current date or time, the programming
support
must take into account that leap seconds have been inserted or deleted
because of time-correction standards. When the TOD clock has been set
correctly to a time within the standard epoch, the sum of the
accumulated
leap seconds must be subtracted from the clock time to determine UTC
time."
... and we're off to the races. :-)
You're right, I got it backwards. I seem to do so every time I come back
to this after several months. I also didn't mean 'sync check'. I meant
the more generic "STP notification." :-)
Time is a complicated subject. Just ask Stephen Hawking. Ignoring the
effects of gravity, there are still many moving parts needed to figure out
what time it is.
- The common frame of reference for the TOD clock
- TOD synchronization with NTP
- The ability of the OS and its apps to accurately calculate "now".
Here's what STP does:
1) Based on NTP, calculate what the proper TOD clock value should be.
2) Steer the machines in the STP timing network to that value.
That's it - it's that simple. Well, it's simple to say, but much harder
to do.
For the purists in the room (I am one of them), the TOD clock needs to be
set as defined in the Principles of Operations. That means using the
Standard Epoch, where a TOD clock value of zero represents 0h 1/1/1900.
And that means STP needs to know about leap seconds!
If you leave out the leap seconds, then every leap second that is
introduced moves the start of the TOD clock epoch backward one second. In
such a system, if I looked at your TOD clock value now and started
counting down to zero, it would be 23:59:34 Dec 31, 1899.
Is that a sin? No. But it complicates things when you want to compare
TOD clock values in multiple systems to determine the order of events! All
systems within a System of Systems need to agree on "What time WAS it?" as
much as "What time IS it?" Such information may be needed for forensics
and failure to have a proper time stamp could bring the validity of any
evidence into question.
And while your lawyer can help you squirm your way out of the quagmire
between leap seconds, the introduction of the next one is the fatal blow.
Here's how it works: STP is watching the NTP stream. It calculates the
desired TOD clock value based on UTC and adjusts the TOD clock to match.
As long as two values are within 50us, we are happy. The sun is shining,
a gentle breeze is blowing, the birds are singing, and the mosquitoes
aren't biting. Our picnic is going famously!
Then there is a thunderclap and bolt of lighting. While the TOD clock has
been steadily incrementing, the leap second arrives and stops time. But
it doesn't stop the TOD clock. As said above, STP looks at UTC calculates
the desired TOD and compares it to the actual TOD. The actual TOD is one
second ahead of where it should be!
+-------+
| EGAD! |
+---++--+
||
//
\\
That's a number WAY larger than 50us. The tornady's a-comin', run for
cover! The next thing you know, there's blowing dirt in your eyes, the
ants are crawling on your food, and you have big welts on your body from
the mosquitoes! STP flies into action: "Don't worry Tod, I'll save you!"
STP notifies all LPARs that have registered for the signal that the TOD
clock has lost synchronization. At the same time STP starts slowing the
TOD clock. In about 7 hours, the TOD clock will be back within 50us of
UTC and the registered LPARs will be notified that the clock is now back
in sync. Calm returns, the dust settles, and the TOD clock is trustworthy
once again.
But silently, unobserved in the shadows, the TOD clock epoch has now moved
one second further back into 1899. In a word, "Eeeewwww!"
Any TOD clock values from before the leap second that are still being used
(e.g. in timers) are no longer valid. Maybe being off by a second is ok,
maybe it isn't. How do I know? I worry more about legal and forensic
issues than the practical aspects of the difference.
When you include the leap second data, particularly the schedule for the
next one, the calculation of the desired TOD compensates, resulting a
value consistent with the Standard Epoch. That means it's going to be
very close to the actual TOD, and can remain in sync. There is no
one-second difference. Yaaay!
STP also notifies the registered LPARs that there has been a change in the
leap second specification. The PTFF-QUERY UTC INFORMATION (QUI)
instruction can be used to get the old and new leap second value. That
means the host is able to properly compute UTC from the TOD clock.
WAKE UP! I'm not done.
Since 1972, it has not been possible to use the TOD clock on the Standard
Epoch to calculate the correct date and time without knowledge of leap
seconds. True, no one cared and did it anyway. Why? Because we didn't
use the Standard Epoch. We used the system operator's watch. It was good
enough. "The Timex Epoch"? LOL! Though I have to admit, as a
13-year-old, all I cared about was that STAR TREK was on at 4:30. One
second either way wasn't an issue for me. Besides, every month my dad
listened to WWV and adjusted the clocks! "Chuckie! Bring me your watch!"
(Even then, he KNEW.)
But over the decades new technologies and governance requirements have
demonstrated that a casual approach to time isn't good enough any more.
ETRs were introduced and leap seconds were mainstream. Imagine the
conversation at cocktail parties! (yawn)
And so, finally, let's talk about figuring out what time it is. Programs
should ASK THE OS. If you use STORE CLOCK or STORE CLOCK EXTENDED, all
you're getting is the TOD clock and as we've discussed, that's not good
enough. Not only do you need leap seconds, but you need time zone and DST
data. But if you just want a TOD clock value as a sequencing widget, then
by all means, use it.
This implies that the OS supports enough of STP to give you the local (or
UTC) time. Heads up! z/VM doesn't. This is why CP QUERY TIME returns a
value that is 26 seconds ahead of UTC. A virtual machine that issue
PTFF-QUI can pick up the leap second value.
Here's what my program gets from PTFF-QUI:
Physical TOD (pb.Tu) = D11C59EA 78E40779
Steering offset (pb.d ) = 00000000 E1C7383B
+ TOD epoch diff (pb.ed) = 00000000 00000000
= Logical TOD offset (pb.dl) = 00000000 E1C7383B
Mode 0:Local 1:ETR 2:STP = 2
Synchronized? 0:No 1:Yes = 1
Next leap second is at = 00000000 00000000
Leap seconds before event = 0
Leap seconds after event = 0
CST Reference time = D11C59EB 5014D65A
CST-TOD dispersion = 00000000 00000000
CST offset from CTS leader = 00000000 00000000
I can see that I'm using STP, this machine is the CTS (if it's in a
network), the clock is in sync, and there are no leap seconds configured.
Sure enough, CP QUERY TIME is telling me the CORRECT time, but the TOD
clock is not on the Standard Epoch. I know this because the leap seconds
are zero. If the leap seconds were properly set to 26, CP QUERY TIME
would be 26 seconds fast.
Given that CMS gets its "time of day" information from CP, it, too, will
be only as correct as CP.
In addition to PTFF-QUI, guests that use the TOD clock to calculate the
local time need to use DIAGNOSE 0x00 to get the TZ offset and register via
DIAGNOSE 0x274 to find out when the timezone changes (e.g. enter/exit
DST). If you get the interrupt, you issue DIAG 0x00 again to get the
updated TZ offset.
I know there was a lot of gorp here, but I hope that it explains better
How It Works and why you see the things you see. I will eventually
incorporate this information into my vmleap.html article.
Alan Altmark
Senior Managing z/VM and Linux Consultant
Lab Services System z Delivery Practice
IBM Systems & Technology Group
ibm.com/systems/services/labservices
office: 607.429.3323
mobile; 607.321.7556
[email protected]
IBM Endicott
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/