The id generator is used to create a key for the message database, and
also to create a Permalink.

Therefore, an id generator needs to fulfil the following design goals
as a minimum:
A) different messages have different IDs
B) the same id is generated if the same message is re-processed
C) equivalent messages have the same ID

Goal A is needed to ensure that the database can contain every different message
Goal B is needed to ensure that the database can be reloaded from the
original source if necessary
Goal C is needed to ensure that the database can be reloaded from an
equivalent source, and to ensure that Permalinks are stable.

None of the current id generator algorithms meet all of the above goals.

The original and medium generators fail to meet goal A.
The full generator fails to meet goal B (and therefore C).

A sender can easily generate two messages with identical content; it
is important to distinguish these.

The Message-Id should help here.

Message-ID is supposed to be unique, in practice it may not be, so
some additional fields need to be used to create the database id.

For mailing lists the Return-Path will normally contain a unique id
which is used to identify bounces.
In theory this might be sufficient on its own. Indeed the path might
be usable without hashing. However the early ASF mailing list software
did not use unique Return-Paths.

The existing mod_mbox solution uses a combination of Message-Id plus
YYYYMM plus a list identifier. Have there ever been any collisions?

How about using the following:

Message-Id
Date
Return-Path
List-Id

Whatever new algorithm is chosen, I think it's important that the
format looks different from the existing ones. e.g. one could drop the
<> around the list id.

Reply via email to