The id generator is used to create a key for the message database, and also to create a Permalink.
Therefore, an id generator needs to fulfil the following design goals as a minimum: A) different messages have different IDs B) the same id is generated if the same message is re-processed C) equivalent messages have the same ID Goal A is needed to ensure that the database can contain every different message Goal B is needed to ensure that the database can be reloaded from the original source if necessary Goal C is needed to ensure that the database can be reloaded from an equivalent source, and to ensure that Permalinks are stable. None of the current id generator algorithms meet all of the above goals. The original and medium generators fail to meet goal A. The full generator fails to meet goal B (and therefore C). A sender can easily generate two messages with identical content; it is important to distinguish these. The Message-Id should help here. Message-ID is supposed to be unique, in practice it may not be, so some additional fields need to be used to create the database id. For mailing lists the Return-Path will normally contain a unique id which is used to identify bounces. In theory this might be sufficient on its own. Indeed the path might be usable without hashing. However the early ASF mailing list software did not use unique Return-Paths. The existing mod_mbox solution uses a combination of Message-Id plus YYYYMM plus a list identifier. Have there ever been any collisions? How about using the following: Message-Id Date Return-Path List-Id Whatever new algorithm is chosen, I think it's important that the format looks different from the existing ones. e.g. one could drop the <> around the list id.