### Introduction
The current CF time system does not address the presence or absence of leap 
seconds in data with a standard name of **`time`**. This is not an issue for 
model runs or data with time resolutions on the order of hours, days, etc, but 
it can be an issue for modern satellite swath data and other systems with time 
resolutions of tens of seconds or finer.

I have written a background section for this proposal, but I have put it at the 
end so that people don't have to scroll through it in order to get to proposal 
itself. If something about the proposal seems unclear, I hope the background 
will help resolve your question.
### Proposal
After past discussions with @JonathanGregory and again with he and @marqh at 
the 2018 CF Workshop, I propose the new calendars listed below and a change to 
existing calendar definitions.

- **`gregorian_tai`** - When this calendar is called out, the epoch date and 
time stated in the **`units`** attribute are required to be Coordinated 
Universal Time (UTC) and the time values in the variable are required to be 
fully metric, representing the the advance in International Atomic Time (TAI) 
since that epoch. Conversion of a time value in the variable to a UTC date and 
time must account for any leap seconds between the epoch date and the time 
being converted.
 - **`gregorian_utc`** - When this calendar is called out, the epoch date and 
time stated in the **`units`** attribute are required to be in UTC and the time 
values in the variable are assumed to be conversions from UTC dates and times 
that did not account for leap seconds. As a consequence, the time values may 
not be fully metric. Conversion of a time value in the variable to a UTC date 
and time must not use leap seconds.
- **`gregorian`** - When this calendar is called out, the epoch date stated in 
the **`units`** attribute is required to be in mixed Gregorian/Julian form. The 
epoch date and time have an unknown relationship to UTC. The time values in the 
variable may not be fully metric, and conversion of a time value in the 
variable to a date and time produces results of unknown precision.
- **`the others`** - The other calendars all have an unknown relationship to 
UTC, similar to the **`gregorian`** calendar above.

The large majority of existing files (past and future) are based on artificial 
model time or don't need to record time precisely enough to require either of 
the new calendars (**`gregorian_tai`** or **`gregorian_utc`**). The modified 
definition of the **`gregorian`** calendar won't pose any problem for them. For 
users that know exactly how they obtained their times and how they processed 
them to get time values in a variable, the two new calendars allow them to tell 
users how to handle (and not handle) those time values.

Once we come to an agreement on the proposal, we can work out wording Section 
4.4 to reflect these new/changed calendar definitions.
### Background
There are three parts to the way people deal with time. The first part is the 
counting of the passing of time, the second part is the representation of time 
for human consumption, and the third is the relationship between the 
representation of time and the orbital and rotational cycles of the earth. This 
won't be a deep discussion, but I want to define a few terms here in the hopes 
that it will help make things clearer. For gory details, please feel free to 
consult Google and visit places such as the NIST and US Naval Observatory 
websites. I'm glossing over some things here, and many of my definitions are 
not precise. My goal is to provide a common framework for thinking about the 
proposal, as opposed to writing a textbook on the topic.

The first part is the simplest. This is time as a scalar quantity that grows at 
a fixed rate. This, precisely measured, is what people refer to as 'atomic 
time' - a count of cycles of an oscillator tuned to resonate with an electron 
level transition in a sample of super-cooled atoms. The international standard 
atomic time is known as International Atomic Time (TAI). So time in this sense 
is a counter that advances by one every SI second. (For simplicity, I am going 
to speak in terms of counts of seconds throughout this proposal.) No matter how 
you may represent time, whether with or without leap days or seconds, this time 
marches on at a fixed pace. This time is metric. You can do math operations on 
pairs or other groups of these times and get consistently correct results. In 
the rest of this proposal I'm going to refer to this kind of time as 'metric 
time'.

The second part, the representation of time, is all about how we break time up 
into minutes, hours, days, months, and years. Astronomy, culture, and history 
have all affected the way we represent time. When we display a time as 
YYYY-MM-DD HH:MM:SS, we are representing a point in time with a label. In the 
rest of this proposal I'm going to refer to this labeling of a point in time as 
a time stamp.

The third part, the synchronization of time stamps with the cycles of the 
planet, is where calendars come into play, and this is where things get ugly. 
Reaching way back in time, there were three basic units for time - the solar 
year, the lunar month, and the solar day. Unfortunately, these three units of 
time are not compatible with each other or with counts of seconds. A solar day 
is not (despite our definitions) an integer number of seconds in length, a 
lunar month is not an integer number of solar days (and we pretty much 
abandoned them in Western culture), and a solar year is not an integer number 
of solar days or lunar months in length. If you attempt to count time by 
incrementing a time stamp like an odometer - having a given element increment 
once each time the element below it has 'rolled over', you find that the time 
stamps pretty quickly get out of synchronization with the sun and the seasons.

The first attempts to address this asynchrony were leap days. The Julian 
calendar specified that every four years February would wait an extra day to 
roll over to March. The Gregorian calendar addressed a remaining asynchrony by 
specifying that this only happens on the last year of a century (when it 
normally would) every fourth century. That was close enough for the technology 
of those days. Clocks weren't accurate enough at counting seconds to worry 
about anything else. But the addition of leap days (as well as months with 
random lengths) means that time stamps aren't metric. You can't do 
straightforward math with them.

In more recent times technology and science have advanced to the point that we 
can count seconds quite accurately, and we found that keeping the time stamp 
hours, minutes, and seconds sufficiently aligned with the rising of the sun 
each day requires the addition (or subtraction) of leap seconds. On an 
irregular, potentially bi-yearly, basis, the last minute of a day is allowed to 
run to 60 before rolling over instead of 59 (or rolls over after 58, though 
it's lately been only additions). Coordinated Universal Time (UTC) is the 
standard for time stamps that include both leap days and leap seconds.

UTC time stamps represent the time in a human-readable form that is precise and 
synchronized with the cycles of the earth. But they aren't metric. It's not 
hard to deal with the leap days part because they follow a fixed pattern. But 
the leap seconds don't. If you try to calculate the interval between 2018-01-01 
00:00:00 and 1972-01-01 00:00:00 without consulting a table of leap seconds and 
when they were applied, you will have a difference of 27 seconds between the 
time you get from your calculation and the time has actually elapsed between 
those two time stamps. This isn't enough of a discrepancy to worry about for 
readings from rain gauges or measurements of daily average temperature, but an 
error of even one second can make a big difference for data from a 
polar-orbiting satellite moving at a rate of 7000 km/second.

The clocks in our computers can add further complexity to measuring time. The 
vast majority of computers don't handle leap seconds. We typically attempt to 
address this by using time servers to keep our computer clocks synchronized, 
but this is done by altering the metric time count in the computer rather than 
modifying the time stamps by updating a table of leap seconds.

Furthermore, most computer software doesn't have 'leap second aware' libraries. 
When you take a perfectly exact UTC time stamp (perhaps taken from a GPS unit) 
and convert it to a count of seconds since an epoch using a time calculation 
function in your software, you are highly likely to have introduced an error of 
however many leap seconds that have been added between your epoch and the time 
represented by the time stamp.

As a result of all this, many of the times written in netCDF files are not 
metric times, and there is no good way to know how to produce accurate time 
stamps from them. They may be perfectly metric within a given file or dataset, 
they may include skips or repeats, or they may harbor non-linearities where 
there are one or more leap seconds between two time values.

We have another minor issue for times prior to 1972-01-01. There's not much way 
to relate times prior to that epoch to times since - not to the tens of seconds 
or better level. I'd be surprised if this would ever be a significant problem 
in our domain.

To summarize, we have TAI, which is precise metric time. We have UTC, which is 
a precise, non-metric sequence of time stamps that are tied to TAI, and we have 
a whole host of ways that counts time since epoch stored in netCDF files can be 
inaccurate to a level as high as 37 seconds (the current leap seconds offset 
between TAI and UTC).

Most uses of time in netCDF aren't concerned with this level of accuracy, but 
for those that are, it can be critical.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/cf-convention/cf-conventions/issues/148

Reply via email to