FWIW, I'd be interested in hearing what others have to say.

We broke backwards compatibility in 0.9 WRT IDs, so it's not set in
stone that every version of the archiver must be fully compatible with
the previous one.

What we could possibly do is have a vote/discussion on whether or not
breaking that sort of compat is permissible in cases of an error (of
sufficient magnitude) in the generator itself.

Alternately, we set out some clear definitions of each generator and set
a policy that those must remain fixed, then create a new generator
whenever a bug is found and leave the old one unpatched. This, however,
has the potential of creating a myriad of new generators.

We have to weigh future operability against the potential risk of having
N% of emails have their permalinks change on a reimport (in this case, I
believe it to be less than 0.1% of emails affected).

In the real-world cases I deal with, the medium generator was busted.
One has to change away from it to avoid duplicates. This means the
archive cannot be fully imported using neither the old nor the new
generator, even if medium was kept as is and a new generator was chosen
for future email. A decision has to be made to break the mold and live
with it.

What do people think?

With regards,
Daniel.

On 08/19/2017 11:25 AM, Daniel Gruno wrote:
> On 08/19/2017 10:46 AM, sebb wrote:
>> -1
>>
>> Sorry, but changing an ID generator will break some Permalinks if the
>> messages are ever reparsed.
> 
> I'm sorry, but I don't accept this as a valid reason for a veto.
> 
> The present first-chance generator is horribly broken as is, as it only
> relies on 'archived-at' (which may not exist and thus will default to
> $now which will always change!) and the list ID, meaning that
> reimporting is *already broken* for these messages if archived-at is
> either missing or not the same. If the DB gets lost or corrupted, and
> you re-import, you will already end up with new permalinks should they
> fall back to this generator due to a missing body.
> 
> IMHO, the first generator also needs to change or it will not be
> reliable at all. but I have not looked at that yet.
> 
> We start off on line 642 with:
>             if not msg.get('archived-at'):
>                 msg.add_header('archived-at', email.utils.formatdate())
> 
> then we generate a buggy ID on line 267 with:
> mid = hashlib.sha224(str("%s-%s" % (lid,
> msg_metadata['archived-at'])).encode('utf-8')).hexdigest() + "@" + (lid
> if lid else "none")
> 
> This means, that if either the archived-at header differs, or if it's
> not present, you'll never get the same ID unless you reimport from the
> fixed-up mbox source from ES...which you probably don't have if you're
> trying to reimport an email. Reimporting from an external mbox file or
> other source will change the ID on every single import, as archived-at
> (if it's not present) will change to whatever time it is when you import.
> 
> TL;DR: reimporting is already broken for messages without a main body, I
> don't believe there's technical merit in sticking with a broken format
> for the sake of maintaining status quo. status quo is broken, and no
> matter how we fix it, IDs may change as a result on a re-import from
> older installations. That is a consequence we should accept.
> 
> With regards,
> Daniel.
> 
>>
>> It will also cause duplicate messages if the messages are ever
>> exported and re-imported without deleting the originals.
>>
>> A possible work-round is to create a new version of the medium
>> generator that fixes this issue.
>> New installations can then use that if they wish.
>>
>> But existing installations cannot just change ID generator and hope
>> that things will be OK
>>
>>
>> On 19 August 2017 at 07:11,  <[email protected]> wrote:
>>> Repository: incubator-ponymail
>>> Updated Branches:
>>>   refs/heads/master be430c666 -> 58ce843d7
>>>
>>>
>>> Make sure we have a body, or ID generation will fail.
>>>
>>> If body is None, PM tries to encode it and fails, falling back
>>> on the legacy ID generator, which is not guaranteed to be
>>> the same across the board. Fix this by setting the body
>>> to an empty string if not found.
>>>
>>>
>>> Project: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/repo
>>> Commit: 
>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/commit/58ce843d
>>> Tree: 
>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/tree/58ce843d
>>> Diff: 
>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/diff/58ce843d
>>>
>>> Branch: refs/heads/master
>>> Commit: 58ce843d79d413f3d3b49647d26f33b41ec9ff2d
>>> Parents: be430c6
>>> Author: Daniel Gruno <[email protected]>
>>> Authored: Sat Aug 19 08:11:00 2017 +0200
>>> Committer: Daniel Gruno <[email protected]>
>>> Committed: Sat Aug 19 08:11:00 2017 +0200
>>>
>>> ----------------------------------------------------------------------
>>>  CHANGELOG.md        | 1 +
>>>  tools/generators.py | 4 ++++
>>>  2 files changed, 5 insertions(+)
>>> ----------------------------------------------------------------------
>>>
>>>
>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/58ce843d/CHANGELOG.md
>>> ----------------------------------------------------------------------
>>> diff --git a/CHANGELOG.md b/CHANGELOG.md
>>> index 05fab5d..e180827 100644
>>> --- a/CHANGELOG.md
>>> +++ b/CHANGELOG.md
>>> @@ -1,4 +1,5 @@
>>>  ## CHANGES in 0.10:
>>> +- Fixed an issue with ID generation where an email body does not exist
>>>  - Fixed an issue where favorites could contain null entries due to missing 
>>> GC (#392)
>>>  - Added sample configs for Pony Mail (#374)
>>>  - Renamed ll.py to list-lists.py (#378)
>>>
>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/58ce843d/tools/generators.py
>>> ----------------------------------------------------------------------
>>> diff --git a/tools/generators.py b/tools/generators.py
>>> index 2ac2ca8..d8c373d 100644
>>> --- a/tools/generators.py
>>> +++ b/tools/generators.py
>>> @@ -55,6 +55,8 @@ def medium(msg, body, lid, attachments):
>>>      attachments - list of attachments (not used)
>>>      """
>>>      # Use text body
>>> +    if not body: # Make sure body is not None, which will fail.
>>> +        body = ""
>>>      xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
>>>      # Use List ID
>>>      xbody += bytes(lid, encoding='ascii')
>>> @@ -94,6 +96,8 @@ def redundant(msg, body, lid, attachments):
>>>      attachments - list of attachments (uses the hashes)
>>>      """
>>>      # Use text body
>>> +    if not body: # Make sure body is not None, which will fail.
>>> +        body = ""
>>>      xbody = body if type(body) is bytes else body.encode('ascii', 'ignore')
>>>      # Use List ID
>>>      xbody += bytes(lid, encoding='ascii')
>>>
> 

Reply via email to