Rob Seaman wrote: >Zefram <[email protected]> wrote: >> Anyone interested in me working up a full spec? > >If you were to lead the writing of it one imagines many would like to read it
OK, have a play with the attached. Comments welcome. Second implementation especially welcome. The binary format is concise enough that it could be usefully applied to encode the entire leap schedule in a DNS TXT record. -zefram
.TH LEMAITRE 5 2015-02-13 LEAPSECS
.SH NAME
lemaitre \- format of leap second schedules
.SH DESCRIPTION
.B Caution: this document is a draft.
.PP
The Lemaitre file formats are two formats (one binary, one textual)
by which to represent the definition of a time scale such as the
post-1972 form of UTC, based on irregularly-scheduled leap seconds.
The main application of these file formats is for UTC itself, but any
similarly-organised time scale can have its leap schedule disseminated
in the same form.
.PP
The file formats are named for the astronomer Georges Lemaitre
(1894-1966). Lemaitre's work included an improved coordinate system for
the warped spacetime around a black hole as described by the Schwarzschild
metric. He was also a pioneer in the use of electronic computers for
cosmological calculations.
.SS Motivation
The UTC time scale (Coordinated Universal Time) is defined in terms of TAI
(International Atomic Time). At any moment, the difference between UTC
and TAI is an integral number of seconds. The difference always remains
the same over the course of a calendar day of UTC, but occasionally
it changes between UTC days. When a change of offset occurs, it is
achieved by adding a second onto the end of a UTC day or by deleting a
second from the end of a UTC day, an event known as a leap second.
.PP
The TAI-UTC difference is chosen to keep UTC approximately synchronised to
the rotating Earth, which changes speed unpredictably. TAI, by contrast,
is a purely atomic time scale. It is therefore not possible to define
UTC very far in advance. Thus there is a need to disseminate updated
leap second schedules as new scheduling decisions are made.
.SS Scope
A Lemaitre file represents a partial definition of a UTC-like time scale.
The time scale must be largely defined by piecewise constant offsets
from another time scale (in UTC's case TAI), where the offsets are of
integral seconds and the pieces consist of complete calendar days of the
defined time scale. The extent of a partial definition may encompass
any finite set of calendar days of the defined time scale.
.PP
It follows that, within a single Lemaitre file, each calendar day
is ascribed either a single integral-seconds offset or no offset.
There are multiple reasons why a day may be given no offset. The time
scale may be permanently undefined for that day (such as UTC prior to
1961), or may be defined for that day in a way that can't be expressed as
an integral-seconds offset from the underlying time scale (such as UTC
from 1961 to 1971). The time scale's definition for that day may not
yet have been decided, or the decision may have not yet been known to
the file's creator. Finally, the file's creator may simply have chosen
to exclude that day from the scope of the file, for example in order to
communicate only new scheduling decisions to a reader who already knows
the previous schedule. A Lemaitre file does not represent which reasons
apply to particular omissions.
.PP
A Lemaitre file does not represent the identity of the time scale that
it defines, nor that of the underlying time scale. These are expected
to be known to file users by out-of-band means.
.PP
Any calendar day can be ascribed any integral-seconds offset from
the underlying time scale, independent of the offsets of other days.
A leap second event is implied wherever consecutive calendar days have
different offsets. Leaps can thus be represented of arbitrarily many
seconds, in either direction, on any day. This is more than is required
to represent post-1972 UTC, which is limited to leaps of one second in
either direction, and only allows them to occur at the end of Gregorian
months.
.SH TEXTUAL FILE FORMAT
The syntax of a textual Lemaitre file is defined here mainly using the
ABNF defined by RFC 5234, including the core rules of that document.
As is customary for RFCs, the syntax here shows newlines as "CRLF",
but this does not mean that separate CR and LF characters are actually
expected. In a Unix context newlines are represented by a single LF
character. Likewise, the ABNF uses ASCII codepoint values to describe
some characters, but does not imply that ASCII must be used: the syntax
is concerned with the characters, not the codepoints. In all respects,
local conventions for the representation of text are expected to be
observed, with conversions where necessary. The characters used in the
file format are all ISO 646 invariant characters.
.PP
.in +4n
.nf
textfile = magic *segment tail
.fi
.in
.PP
A textual Lemaitre file consists of a `magic number', zero or
more segments of time scale definition, and then a tail explicitly
finishing the file. The explicit tail means that the file format is
self-delimiting, so a truncated file will always be detected.
.PP
.in +4n
.nf
magic = %x71.5f.4d.3d.2b.64.26.2e.2f.3d CRLF
.fi
.in
.PP
The magic number is a line containing the ten characters
"\fBq_M=+d&./=\fP", and serves to identify or confirm the file format.
.PP
.in +4n
.nf
segment = date-range SP offset CRLF
.fi
.in
.PP
A segment of time scale definition ascribes a specific offset (difference
between the defined and underlying time scales) to a range of consecutive
calendar dates. It is represented as a single line of text, containing
the representations of the date range and the offset separated by a space.
.PP
.in +4n
.nf
offset = "+0" / ("+"/"-") %x31-39 *DIGIT
.fi
.in
.PP
An offset is represented in seconds, in decimal, always with a sign
(B<+> for zero), and with as few digits as possible. A positive offset
value means the defined time scale is behind the underlying time scale,
and a negative offset value means the defined time scale is ahead of
the underlying time scale. This way round may seem perverse, but is
the way that the difference between UTC and TAI is usually described
(TAI minus UTC).
.PP
.in +4n
.nf
date-range = date "/" date
date = year "-" month-of-year "-" day-of-month
year = ["-"] 4DIGIT / ("+"/"-") %31-39 4*DIGIT
month-of-year = "0" %x31-39 / "1" %x30-32
day-of-month = "0" %x31-39 / %x31-32 DIGIT / "3" %x30-31
.fi
.in
.PP
Date ranges are represented by means of the first and last dates in
the range (inclusively). Each date is stated numerically, using the
Gregorian calendar (prolproleptic where necessary, with astronomical
year numbering). If it is desired to refer to the year 0 CE (1 BCE),
the syntax "\fB0000\fP" must be used, not the "\fB-0000\fP" that is also
permitted by the above ABNF. The \fIday-of-month\fP of a \fIdate\fP must
be in the appropriate range for the \fImonth-of-year\fP and \fIyear\fP.
Overall, the textual format here is a subset of that specified by ISO
8601: extended format, expanded as necessary.
.PP
A date range is not permitted to be empty or to extend backwards in time.
Thus the last date of a date range must not be earlier than the first
date.
.PP
The segments within a Lemaitre file must appear in chronological order,
and must not overlap. Thus the first date in the range of each segment
after the first must be strictly later than the last date in the range
of the preceding segment. It is also not permitted for segments to abut
if their offsets are equal: in such a case they must be merged into a
single segment.
.PP
Dates that are not included in the date range of any segment in the file
are not ascribed any particular offset by the file.
.PP
.in +4n
.nf
tail = check / no-check
check = ":" 26base64-6bit base64-4bit CRLF
base64-6bit = ALPHA / DIGIT / "+" / "/"
base64-4bit = %x41/%x45/%x49/%x4d/%x51/%x55/%x59/%x63 /
%x67/%x6b/%x6f/%x73/%x77/%x30/%x34/%x38
no-check = "." CRLF
.fi
.in
.PP
The tail of the file is a line containing either just a dot, or a check
field. The check field is represented as a colon followed by 27 base64
digits, giving a 160-bit check value. The base64 is as defined by RFC
4648, except that padding is not used: that standard would require a
single trailing equals sign to be added. If the check field is present,
the 160-bit value that it represents must be the same as the 160-bit check
value that would be included in a binary Lemaitre file representing the
same time scale information. See below for its definition.
.PP
The \fIno-check\fP tail is appropriate for a file that is to be edited by
a human. Once editing is finished, at the earliest opportunity software
should be used to compute the check value and attach it to all downstream
incarnations of the file. Any textual Lemaitre file that is generated
by software or that is exchanged between systems should include the
check field, so that accidental corruption of the file can be detected.
Anywhere that software interprets a textual Lemaitre file that includes
the check field, it should compute the check value for the data it sees
and reject the file if there is a mismatch.
.SH BINARY FILE FORMAT
A binary Lemaitre file consists of a sequence of octets. The file
format is self-delimiting, so a truncated file will always be detected.
The file as a whole consists of three concatenated parts: file magic,
body, and check field.
.PP
The file magic serves to identify or confirm the file format. It consists
of the eight octets
.PP
.in +4n
.nf
0xe9 0x9b 0xfe 0xc0 0x32 0x36 0xe9 0xe5
.fi
.in
.PP
The body is made up of the concatenated representations of a
self-delimiting sequence of unsigned integers \fIU[i]\fP, and it is that
sequence that represents the informational content.
.PP
The unsigned integers are each represented as a self-delimiting sequence
of one or more octets, by means of a simple universal code. An integer
\fIU\fP whose value fits into seven bits (i.e., is less than 0x80) is
represented by an octet of that value. Any greater integer \fIU\fP is
represented by a bit sequence consisting of a 1 bit, the representation
of \fI(U >> 7) - 1\fP, then the lowest seven bits of \fIU\fP, most
significant first. For the purposes of this definition, an octet is
equivalent to the sequence of its eight constituent bits, taken from most
significant to least significant, and a sequence of octets is equivalent
to the bit sequence consisting of the concatenation of these sequences
for its constituent octets.
.PP
In some places it is required to represent a signed integer \fIS\fP in
the form of an unsigned integer. This is done via the function \fIz(S)\fP
defined thus:
.PP
.in +4n
.nf
z(S) = S * 2 if S >= 0
z(S) = S * -2 - 1 if S < 0
.fi
.in
.PP
The informational content of the file consists of a group of segments
of time scale definition, each ascribing a specific offset (difference
between the defined and underlying time scales) to a range of consecutive
calendar dates. An offset is represented as a signed integer number
of seconds. A positive offset value means the defined time scale is
behind the underlying time scale, and a negative offset value means
the defined time scale is ahead of the underlying time scale. Abutting
segments are not permitted to have the same offset: they must be merged.
These concepts are identical to those for the textual file format.
.PP
If no time scale definition segments are to be represented, then
\fIU[0] = 0\fP and there are no other \fIU[i]\fP.
.PP
Otherwise, \fIU[0]\fP represents the first date of the first segment, by
means of its Modified Julian Day Number (MJDN): \fIU[0] = 1 + z(MJDN)\fP.
\fIU[1]\fP then represents the offset \fIO\fP applying to the first
segment: \fIU[1] = z(O)\fP.
These rules are unique to the first segment: thereafter delta
representations are used to avoid having to fully represent large numbers.
.PP
\fIU[2]\fP represents the last date of the first segment, in the manner
used for regular segments (see below). Thereafter the file contains
the representations of the other segments in turn, then a final sentinel
signals the end of the body.
.PP
The representation of a segment after the first depends on whether it
abuts the previous segment. If it does not, the representation of the
end of the previous segment is followed by a \fIU[i] = 1 + z(0) = 1\fP,
then a representation of the segment's first date, then a representation
of the segment's offset. The segment's first date is represented by
means of a delta from the end of the previous segment: if the previous
segment's last (included) date was MJDN \fIB\fP and this segment's first
date is MJDN \fIC\fP, then \fIU[i] = C - B - 2\fP. Observe that this
imposes a minimum gap of one clear day between the segments: if there
is no gap between the segments then this representation is not used.
It also requires that segments be represented in chronological order and
not overlap. The segment's offset is then represented as a delta from
the offset of the previous segment: if the previous segment's offset was
\fIO\fP and this segment's offset is \fIP\fP, then \fIU[i] = z(P - O)\fP.
.PP
If a segment does abut the previous segment, its first date does not
need to be represented explicitly. Instead, only its offset need be
represented, which is done as a delta from the offset of the previous
segment. If the previous segment's offset was \fIO\fP and this segment's
offset is \fIP\fP, then \fIU[i] = 1 + z(P - O)\fP.
Recall that in this case it is not permitted for \fIP\fP and \fIO\fP
to be equal; the \fIU[i]\fP value \fI1 + z(0)\fP that that situation
would apparently generate is actually used (above) as a sentinel value
to signal a gap between segments.
.PP
In either case, and even in the case of the first segment, the
representation of a segment's offset is immediately followed by a
representation of its last date. This is done in the form of a delta
from the segment's first date. if the first date of the segment is
MJDN \fIA\fP and the last date is MJDN \fB\fP, then \fIU[2] = B - A\fP.
Observe that this makes it impossible to represent a date range that is
empty or extends backwards in time.
.PP
After all the segments, a final \fIU[i] = 0\fP signals the end of
the body.
.PP
After the body, the file ends with a check field of 20 octets (160 bits),
so that accidental corruption of the file can be detected. Anywhere that
software interprets a binary Lemaitre file, it should compute the check
value for the data it sees and reject the file if there is a mismatch.
.PP
The 160-bit check value is the SHA-1 hash of a message consisting of the
check magic followed by the file body. The check magic consists of the
eight octets
.PP
.in +4n
.nf
0xd4 0x22 0x05 0xfe 0x06 0xa6 0x59 0xb2
.fi
.in
.SH FILENAMES
The recommended file extension for textual Lemaitre files is
"\fB.lmte\fP". The recommended file extension for binary Lemaitre files
is "\fB.lmtr\fP".
.SH MEDIA TYPE NAMES
Media type names identifying the Lemaitre file formats (for use in
MIME, HTTP, and related contexts) have not yet been registered.
For the time being, it is recommended to use unregistered media
type names. The recommended media type name for textual Lemaitre
files is "\fBapplication/x.lemaitre-txt\fP", and it may take a
\fBcharset\fP parameter, defaulting to \fBcharset\fP="\fBUS-ASCII\fP".
The recommended media type name for binary Lemaitre files is
"\fBapplication/x.lemaitre-bin\fP", which takes no parameters.
.SH SEE ALSO
.BR convert_lemaitre (1),
RFC 4648,
RFC 5234
.SH AUTHOR
Andrew Main (Zefram) <[email protected]>
convert_lemaitre
Description: application/perl-source
197201_to_201512.lmte
Description: application/x.lemaitre-txt
197201_to_201512.lmtr
Description: application/x.lemaitre-bin
finch_putc.lmte
Description: application/x.lemaitre-txt
finch_putc.lmtr
Description: application/x.lemaitre-bin
quadrennial.lmte
Description: application/x.lemaitre-txt
quadrennial.lmtr
Description: application/x.lemaitre-bin
_______________________________________________ LEAPSECS mailing list [email protected] https://pairlist6.pair.net/mailman/listinfo/leapsecs
