[JB]
Tomoyuki,
If sorting by the Date: field is causing big problems for fukuzawa, we
will switch back to sorting by recieved time, and find another way
tohandle archive imports. (Let me coordinate this with the folks who
arecurrently importing archives; this may take a day or two.)
[AL]
Thanks, we should have no problem with a switch a day or two from now as
primary use likely to be retrospective access by thread rather than
following current messages by date (not using archive as a "shared
Message User Agent" as appears to be the case with Tomoyuki for
fukuzawa). Comments below motivated by concern for long term solution
rather than conflict of interest (my message
http://www.mail-archive.com/[email protected]/msg00043.html was what
sparked the change which is apparantly causing problems for Tomoyuki).
Long term messages need to be "correctly" sequenced both within threads
and by date. In my view in both cases the "correct" sequence is the date
on which the message was SENT, not the date received at the archive. It
is just as important that this "correct" sequence be maintained within
threads as in the date index so that replies to previous messages are
always shown after the original message.
The current problem is that messages are being "wrongly" sequenced -
replies are being shown as coming before the originals despite obviously
having been SENT later. This has been triggered by switching the system
to (supposedly) use the "date" (sent) instead of "received", and that
may cause confusion as to what the problem is, and therefore what the
solution may be. It may seem that since the problem is triggered by
using the "date" (sent), the "correct" sequence is not really the (sent)
"date" after all, but some other sequence such as the (previously used)
"received". In dealing with the problem let us be clear that what we
want is for replies to be shown as coming after originals and therefore
the "correct" sequence really is the (actual) date sent. The question is
how to make the system stop doing something else.
A separate possibly useful function of the archive could be to record
the dates messages are received, especially in conjunction with digital
signatures by the archive as a formal "notary public" proof that a
message was publicly available as from a certain date independent of
whatever claim is made by the sender or others. This however is a
secondary function, which may not need a separate index and could even
be recorded only in comments within the HTML that are not normally
displayed. (The sequence in which messages are received is in any case
clearly visible from the message numbers included in the message URLs,
which can be looked at easily by anyone especially interested in that
sequence).
[JB]
Let me ask this: is the cause of the problem
[1] - time zone information is being ignored
[AL]
If that is the cause then the software that is ignoring the time zone
information (MHonArc) needs to be fixed. A check should be made on the
mailing list for that software to see if anyone else using the recently
introduced DATEFIELDS paramater has encountered and perhaps fixed the
problem.
As far as I can see from a quick glance at the code in mhamain.pl, the
same code is used for parsing the dates whether it is based on the
"date" (SENT) or on "received". However there is a special case
concerning field separators when "received" is used and also a slightly
different treatment of messages which have no date field at all, so it
might be a good idea to take a close look.
The code for &parse_date is in mhtime.pl. As far as I can see it parses
time zones specified as +-HHMM but not time zones specified as PST, EST
etc. This presumably is what the RFCs for mail headers require, since
non-numeric time zones are bound to be ambiguous and inconsistent to
some extent.
If that code is faulty, the problem would show up only when DATEFIELDS
is not set to the default since the local time used at the archive will
always be the same when the default of "received" is used. But there is
no other reason to assume that it is faulty.
[JB]
[2] - lots of computers in Japan/US have the misset clocks
[AL]
This seems a plausible explanation, if one includes incorrect time zones
as misset clocks.
Abandoning information supplied by those computers as to what date/time
they sent messages and relying on the date/time received is an obvious
workaround (widely used and recommended in the default settings for
MHonArc). But with computers becoming universally connected to the
global internet they really do need to have correctly set clocks,
including time zones, for numerous reasons. Inconvenience to others
caused by not doing so ultimately needs to be fixed rather than worked
around by others. Ultimately it will be. Short term workarounds merely
slow down the inevitable fixing of misset clocks/time zones prompted by
email from others affected drawing attention to the incorrect settings.
[3] - there is a big erratic delay in email transmission times meaning
messages arrive out of order.
This does not seem likely to be the cause of the current complaint as it
has been described as a consistent 17 hour incorrect sequencing between
messages posted from Japan and the US. It would have been noticed as an
improvement rather than resulting in a complaint.
[4] - something else?
Faulty Mail User Agents which do not set the "date" (SENT) field
according to the clock and the RFCs concerning (numeric) time zones in
date headers. See for example the list of URLs and dates for the 5
previous messages in this thread below:
5th http://www.mail-archive.com/[email protected]/msg00062.html 16 Nov 1998
11:06:34 -0500
4th http://www.mail-archive.com/[email protected]/msg00060.html Mon, 16 Nov
1998 10:30:49 -0500
3rd http://www.mail-archive.com/[email protected]/msg00059.html Sun, 15 Nov
1998 22:35:17 PST
2nd http://www.mail-archive.com/[email protected]/msg00057.html Sun, 15 Nov
1998 22:14:43 PST
1st http://www.mail-archive.com/[email protected]/msg00056.html Sun, 15 Nov
1998 23:55:21 -0500
Two of these messages use PST instead of a numeric time zone (but are
still correctly sequenced in this list).
An easy check would be to examine the out of sequence messages and see
if they use non-numeric (or empty) time zones. Only after having done
that to confirm or refute this being the problem should other work be
done such as reviewing MHonArc code or disabling the use of DATEFIELDS.
A possible workaround could be to add such time zones to &parse_date but
the long term solution is for user's to use MUAs that comply with global
internet standards.
My understanding is that there was a long period in which widely used
MUAs did not provide numeric time zones and the workaround of relying
only on dates received or attempting to process potentially ambiguous
and inconsistent non-numeric time zones resulted from that situation.
The argument for a workaround is stronger when the problem requires
changes to user software rather than simply re-setting their clocks/time
zones, but I suggest the same policy approach should be adopted as for
possibility [2] above. Ultimately people interacting through a global
internet have to exchange time information accurately and nothing should
be done to slow down the recognition of that reality.
Any temporary workaround if there is still widespread use of faulty MTAs
should be based on additional indexes for received date as an option
rather than abandoning the exchange of accurate date information by
replacing the "correct" date sequence in standard indexes.
[JB]
If it is the third problem, maybe it is not so serious (the
orderingbetter reflects reality, and compensates for transmission
hiccups.)
[AL]
I disagree with describing a "better reflection of reality" as a "not
too serious problem" ;-)
[JB]
However, if it is the first or second problem (which it sounds like
itis) there are 17 hour sorting mistakes all over the place, and that
isunacceptable.
[AL]
Agreed and applied also to [4], which I believe is the most likely,
subject to not having actually done the check necessary to verify or
refute that.
If the problem is [1] the unacceptable result should be fixed by fixing
the broken routine in MHonArc rather than by disabling the DATEFIELDS
feature implemented in 1.3.1.
If the problem is [2] or [4] the situation is still unacceptable but the
policy issue that arises for the long term is whether to:
A) Abandon the "feature" (not problem) of [3] and fail to compensate for
varying transmission delays by abandoning the information supplied in
the "date" sent field and using only the time "received" at the archive.
Long term this would be a reduction in quality of service since
incorrect sequencing as a result of the vagaries of email will have to
be dealt with long after misset clocks/time zones and faulty MUAs become
unimportant. The date a message was sent is important information
supplied by MUAs on behalf of users to other users which should not be
abandoned as a result of failure of some users and their software to
supply it correctly. It is particularly important information BECAUSE
email does have varying and unpredictable delays between transmission
and reception.
This option would also complicate the addition of old messages for newly
subscribed mailing lists which is an essential feature of an "archive"
as opposed to a shared MUA for following mailing lists. In my view
priority should be given to the "archiving" function but the shared MUA
function is also important and long term the complication can be dealt
with as discussed in other messages.
B) Keep feature [3] and treat the issue reported as a problem for the
users of the mailing lists rather than a problem for the archive by
retaining the current setup. Long term this will help speed up the
transition to correctly set clocks and replacement of faulty MUAs by
providing a visible record ("archive" ;-) of the problems caused by
misset clocks and faulty MUAs (less noticeable when just following a
current list rather than accessing an archive, but still present
whenever one receives a message with a misleading date sent). However it
will be only a small part of the long term pressures to fix these
problems and will cause short term problems for users of the archive as
a shared MUA.
C) Provide both forms of indexing by providing an additional set of
indexes in "received" order. This would seem to be the optimum medium
term solution in providing a short term workaround without reducing long
term quality of service or slowing down ultimately necessary transitions
to correctly set clocks/timezones and correct MUAs.
If the problem is [4] and is mainly caused by non-numeric rather than
missing time zones there is also the possible workaround of:
D) Attempting to add recognition of common non-numeric time zones to
&parse_date
BTW Users of different archived lists are likely to take different views
of the policy choices involved. As well as favoring option C this also
points to the desirability of being able to customize the handling of
individual lists where required - e.g. either fukuzawa could be set to
"received" or public-list set to "date" rather than having to decide on
one or other for both in the short term.