Subject: Bug report: ISO 8601 timezone handling has minor buglet in src/date.c and 
lib/strftime.c and other 8601 issues

Dear sh-utils/date maintainers,
Cc: David MacKenzie

Please excuse me, if I have sent this bug report to the wrong e-mail addresses.
They were mentioned in the unpacked rpm source (SuSE Linux 6.4) of the sh-utils
package (v 2.0), lib/strftime.c (no $Id$ or version found) and date.c (ditto).

I have tried out the ISO 8601 date/time support in the `date' command (-I and
--iso-8601) and found a small buggie (deviation from the standard) and in the
time zone handling---by the way: great thing you have added this support!
Also I would like to propose some additional features (in conformance with
the standard) and report some possible textual errors here and there.
For clarity, I have numbered the 'issues', should you wish to reply if you
have questions or if I'm wrong etc. :)
I use 8601 format extensively in log file (names), but until now I used a
complicated alias to fix things
(  date '+%Y-%m-%dT%H:%M:%S%z' | sed -e 's/00$/:00/' -e 's/30$/:30/'  )
to create log file names etc. (with nice lexicographical == date ordering).

Please note I have attached a PDF file '8601v04.pdf' which will become the
ISO 8601 2nd edition (if not already so, check a year ago).
The '$' is the Unix prompt in all examples.

1. wrong 'time zone' representation: missing colon, +/-hhmm instead of +/-hh:mm
   (8601 parlance 'difference between local time and Coordinated Universal Time)

It is evident that you have followed the ISO 8601 standard (handy url btw:
    http://www.cl.cam.ac.uk/~mgk25/iso-time.html
very carefully. However, the time zone (if not `Zulu') is incorrect: the colon
is missing. Please let me give an example (I'm in the Netherlands which is
UTC+1 and now with summertime UTC+2):

    $ date --iso-8601=seconds               # 'Extended format' implicitly
    2000-04-16T11:26:40+0200                <-- wrong, '0200' should be '02:00'

Evidently, you use the so called Extended format (ref. 5.3.3 in the
standard), because you use the hyphen ('-') in CCYY-MM-DD and colon (':')
in hh:mm:ss, whereas the so called Basic format is CCYYMMDD and hhmmss,
respectively. However, if you use the extended format, you should also
use the colon in the time zone part (if not Z). Btw, 'time zone part'
is called, in the standard, the difference between local time and
Coordinated Universal Time (UTC). Strictly speaking, you may omit the mm
part, because (see the appended document '8601v04.pdf' ISO 8601 2nd ed)
"... The minutes component of the difference may only be omitted if the
 time difference is exactly an integral number of hours".
(btw, it seems that only an island of the Australian coast has something
 like +01:30 or something!). But imho, for uniformity, i'd always use,
both in basic and extended format, the minute part.

I tried to fix the sh-utils source so I could you a cdiff fix, but I got
quite lost, because it seems to be due to the implementation of '%z' in
Gnu's strftime() (see lib/strftime.c), which _always_ returns hhmm.
Imho, this is an error and quite strange for a Posix (ISO!) function, cuz
the ISO standards should be consistent with each other, but I may see this
wrong. Maybe setting the locale (LC_TIME=POSIX) should fix this, but still
date '+%c %z' gave hhmm. Anyways, it would seem an extra %option should be
added to strftime()? Or use a Conversion Specifier? (But I didn't find one
for timezone in the Single Unix spec...). Anyways, the letter F (fix!) seems
to be free ;-)  (date '+%f' -> 02:00)

2. Proposal: add support for the 'Basic Format' 

I could imagine support for the so called 'Basic format' would be handy
(you now only support the 'Extended format').
Please let me give some examples, which will hopefully make this clear.
Note that the examples also fix bug 1, and I use the TIMESPEC (man date)
field to indicate Basic or Extended (the default being Extended in order
not to break old scripts using date --iso-8601):

    $ date --iso-8601=seconds               # 'Extended format' implicitly
    2000-04-16T11:26:40+02:00               <-- fixed date program: '02:00'
                                                instead of erroneous '0200'

    $ date --iso-8601=minutes               # 'Extended format' implicitly
    2000-04-16T11:26+02:00

    $ date --iso-8601=extended,seconds      # 'Extended format' explicitly
    2000-04-16T11:26:40+02:00


    $ date --iso-8601=basic,seconds         # 'Basic format' explicitly
    20000416T112640+0200

    $ date --iso-8601=basic,minutes         # 'Basic format' explicitly
    20000416T1126+0200

    $ date --iso-8601=basic,hours           # 'Basic format' explicitly
    20000416T11+0200                        <-- time zone still fully spec'fd.


3. Bug? Unclear option '-I'.

I'm not sure I fully understand '-I' but I may be reading wrong; man date:

  "  -I, --iso-8601[=TIMESPEC] output an ISO-8601 compliant date/time string.

        TIMESPEC=`date' (or missing) for date only,
            `hours', `minutes', or `seconds' for date and time
             to the indicated precision. "

However, if I do:

    $ date -I
    2000-04-16

This does not seem to be a "date/time string" to me, but only a date?!
It also does not seem intuitive: you would expect the time as well, because
if you just do a 'date' you get date and time as well, and imho '-I' should
just be like 'date' but then in 8601 form.
The man page seems to suggest '-I' is equivalent to '--iso-8601' and hence
should support the '=TIMESPEC'; however, it does not:

    $ date -I=seconds
    date: invalid argument `=seconds' for `--iso-8601'
    Valid arguments are:
      - `date'
      - `hours'
      - `minutes'
      - `seconds'
    Try `date --help' for more information.

Please also the error message is wrong, it should say:
    date: invalid argument `=seconds' for `-I'

The info page on date 'info date' says that you should do 'date -Iseconds'
and indeed that works, but is this form compliant with the Posix standard
for options? It looks like a multi-letter option!

4. Addition to manual page of strftime()

   $ man strftime
      %D     Equivalent  to  %m/%d/%y.  (Yecch  -  for Americans
             only.  Americans should note that  in  other  coun-
             tries %d/%m/%y is rather common. This means that in
             international context this format is ambiguous  and
             should not be used.) (SU)

Indeed most European countries use %d-%m-%y (day-month instead of
month-day and a hyphen instead of a slash, to Europeans American dates
are always confusing and extra so, if the day is <= 12 ;-)
That's why i like ISO 8601!
Also note that 8601 is quite specific about separators (4.5):

    -  for year, month, day etc.
    :  for hour, minute, second etc.
    /  to separate two components in the represenation of periods of time

It is unfortunate that programs like CVS (the $Date$ field) and Samba (log
files), who sort of try to be 8601 compliant, use the wrong separator and
give dates like 2000/04/16 instead of 2000-04-16.

In the Netherlands, now that we're in the strange year '00', it has become
more common to write %d-%m-%Y, eg. 16-4-2000 instead of 16-4-00.
I would suggest the following text for the strftime() page:

      %D     Equivalent  to  %m/%d/%y.  (Mainly for Americans
             only.  Americans should note that  in  other  coun-
             tries %d-%m-%y or %d-%m-%Y is rather common. This means that in
             international context this format is ambiguous  and
             should not be used.) (SU)

Btw: what does "SU" mean? It is not explained on the man page.

5. References to Greenwich Mean Time (GMT) should be phased out.

A grep on the unpacked sh-utils source gives a lot of references to GMT.
It is better to not mention GMT and speak of UTC (Coordinated Universal Time),
a typical acronym ;) or at most say something like (ref ISO 8601 standard,
note 2): "UTC is often (incorrectly) referred to as Greenwich Mean Time)".

Markus Kuhn summarizes it nicely as:

"  [The Z stands for the "zero meridian", which goes through Greenwich in
   London, and it is also commonly used in radio communication where it
   is pronounced "Zulu" (the word for Z in the international radio
   alphabet). Universal Time (sometimes also called "Zulu Time") was
   called Greenwich Mean Time (GMT) before 1972, however this term should
   no longer be used. Since the introduction of an international atomic
   time scale, almost all existing civil time zones are now related to
   UTC, which is slightly different from the old and now unused GMT.]  "


Well, I hope these (nitpicking) remarks are worthwile.
Thanx for this fine tool set which is (a/o) why free OSs like Linux could
become such a complete and good OS!

Bye-bye,

Eric Maryniak
-- 
Eric Maryniak <[EMAIL PROTECTED]>
Home page: http://pobox.com/~e.maryniak/
University of Amsterdam, Department of Psychology.
Tel/Fax: +31 20 5256853/6391656. Internet: http://www.neuromod.org/

C++: you deserve better than a C+.

8601v04.pdf

Reply via email to