On 13 October 2016 at 21:28, sebb <seb...@gmail.com> wrote:
> On 7 October 2016 at 00:44, sebb <seb...@gmail.com> wrote:
>> The id generator is used to create a key for the message database, and
>> also to create a Permalink.
>>
>> Therefore, an id generator needs to fulfil the following design goals
>> as a minimum:
>> A) different messages have different IDs
>> B) the same id is generated if the same message is re-processed
>> C) equivalent messages have the same ID
>>
>> Goal A is needed to ensure that the database can contain every different 
>> message
>> Goal B is needed to ensure that the database can be reloaded from the
>> original source if necessary
>> Goal C is needed to ensure that the database can be reloaded from an
>> equivalent source, and to ensure that Permalinks are stable.
>>
>> None of the current id generator algorithms meet all of the above goals.
>>
>> The original and medium generators fail to meet goal A.
>> The full generator fails to meet goal B (and therefore C).
>>
>> A sender can easily generate two messages with identical content; it
>> is important to distinguish these.
>>
>> The Message-Id should help here.
>>
>> Message-ID is supposed to be unique, in practice it may not be, so
>> some additional fields need to be used to create the database id.
>>
>> For mailing lists the Return-Path will normally contain a unique id
>> which is used to identify bounces.
>> In theory this might be sufficient on its own. Indeed the path might
>> be usable without hashing. However the early ASF mailing list software
>> did not use unique Return-Paths.
>>
>> The existing mod_mbox solution uses a combination of Message-Id plus
>> YYYYMM plus a list identifier. Have there ever been any collisions?
>
> It's certainly possible for mod_mbox to contain multiple messages with
> the same id.
>
> For example:
>
> From 
> dev-return-15575-apmail-airavata-dev-archive=airavata.apache....@airavata.apache.org
>  Tue Jun  7 20:08:04 2016
> and
> From 
> dev-return-15577-apmail-airavata-dev-archive=airavata.apache....@airavata.apache.org
>  Tue Jun  7 20:24:24 2016
>
> both have the same id:
>
> Message-Id: <bcd87d1a-bfb6-4f2a-92cc-af8f1d16b...@apache.org>
>
> It's exactly the same message which arrived twice in the same mailing
> list a few seconds apart.
> Maybe it was sent using Bcc as well as To: ?
>
> It's not exactly a collision, but at present mod_mbox is able to store
> both whereas Pony Mail cannot (except if using the full generator,
> which has other problems)
>
> I think it's important to store the full message history in the database.
> For example, if one of the messages bounces, it would be odd if the
> source of the bounce were not in the database.
> Also the message sequences will be incomplete.
> This is the case for lists.a.o, the mbox
>
> https://lists.apache.org/api/mbox.lua?list=d...@airavata.apache.org&date=2016-6
>
> does not have the message sequence number 15575
>
>> How about using the following:
>>
>> Message-Id
>> Date
>> Return-Path
>
> In this case the return-path can be used to distinguish the messages.

Note that the path depends on where the message is stored:

Return-Path: 
<dev-return-15577-archive-asf-public=cust-asf.ponee...@airavata.apache.org>
as against
Return-Path: 
<dev-return-15577-apmail-airavata-dev-archive=airavata.apache....@airavata.apache.org>

So it will need adjusting to extract the parts that are the same for
all message sources, whether that is mod_mbox or Pony Mail or sent to
another subscriber.

The format will presumably depend on the mailing list software that is used.

>> List-Id
>
> I think this must be the original List-Id, not any override.
> Otherwise there may be problems with permalinks if a list name is
> updated - the old permalink will no longer work.
>
>> Whatever new algorithm is chosen, I think it's important that the
>> format looks different from the existing ones. e.g. one could drop the
>> <> around the list id.

Reply via email to