On 09/07/2017 12:24 AM, sebb wrote:
> On 6 September 2017 at 07:32, Daniel Gruno <[email protected]> wrote:
>> On 09/06/2017 12:09 AM, sebb wrote:
>>> On 2 September 2017 at 09:02,  <[email protected]> wrote:
>>>> Repository: incubator-ponymail
>>>> Updated Branches:
>>>>   refs/heads/master c8f4d3b7d -> df0b7ee1c
>>>>
>>>>
>>>> crop out trailing whitespace for redundant archiver
>>>>
>>>> This deals with spurious whitespace that can exist on
>>>> clustered setups due to corrections inside the MTAs.
>>>> This only deals with trailing whitespace, everything else
>>>> is preserved.
>>>
>>> -1
>>>
>>> I don't think this is a good idea.
>>
>> Your -1 is noted, but I don't consider the reasoning valid for a veto,
>> so I'll interpret this as just a plain -1.
> 
> AIUI, that's not your call.

It's not my call to determine whether technical merit is sound (that
would be for the PPMC in such cases), but there has to be technical
merit in -1 in the first place. Saying "this is not a good idea" does
not convey technical reason. You've since elaborated on that in your
reply, and _that_ I believe constitutes a technical reason.

> 
>> I think it's a good idea, I think it solves some real problems that have
>> been spotted in clustered setup. It could also solve problems where one
>> archives as mbox with an extra newline by mistake. It's also an optional
>> generator, not the default. Could you elaborate on why trailing
>> whitespace would matter?
> 
> I already wrote that ignoring whitespace causes a problem because it
> means two different inputs end up with the same database id.
> There's no way of knowing which one was correct; the wrong one may end
> up being stored.

But they would both have the same sender, date, list, message,
attachments etc filed under the same ID - is that not what we want? What
we _don't_ want is for trailing whitespace to cause duplicates. Put in
other words: Why would we at all care whether one has the added newline
or two and the other one doesn't? We're dealing with showing people
emails, but bit-perfect of what was sent (including duplicates as a
result of bit-diversion), but rather of what was intended. If we wanted
a perfect copy, we'd use the full digest and skip clustered setups all
together, hoping machines don't die on us. This is for those rare
occasions where something _does_ go wrong, and as seen, sometimes
postfix will add some extra newlines - I still don't know why it does
that in every case, I only know that it does, and likely other MTAs do
as well.

> 
> If messages are being treated differently by different parts of the
> cluster then that needs to be addressed.
> 
>>>
>>> Please raise an issue for significant changes such as this.
>>
>> I don't see it as significant changes - it's an optional generator that
>> we haven't released, it's not gonna break any API/ABI we currently have.
> 
> Once a generator has been released, it effectively becomes part of the
> public API.
> It's then not possible to fix it without breaking the API.
> So any problems with the code must be fixed before release.
> 
>> If I am to raise a ticket, it would be for the full generator to be
>> admitted as an option.
> 
> That is definitely a separate ticket.
> 
>>>
>>>> Also add some spacing in the code.
>>>>
>>>>
>>>> Project: http://git-wip-us.apache.org/repos/asf/incubator-ponymail/repo
>>>> Commit: 
>>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/commit/df0b7ee1
>>>> Tree: 
>>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/tree/df0b7ee1
>>>> Diff: 
>>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/diff/df0b7ee1
>>>>
>>>> Branch: refs/heads/master
>>>> Commit: df0b7ee1c5676b3286e4b30c135f410b1e0a92da
>>>> Parents: c8f4d3b
>>>> Author: Daniel Gruno <[email protected]>
>>>> Authored: Sat Sep 2 10:02:10 2017 +0200
>>>> Committer: Daniel Gruno <[email protected]>
>>>> Committed: Sat Sep 2 10:02:10 2017 +0200
>>>>
>>>> ----------------------------------------------------------------------
>>>>  tools/generators.py | 9 +++++++++
>>>>  1 file changed, 9 insertions(+)
>>>> ----------------------------------------------------------------------
>>>>
>>>>
>>>> http://git-wip-us.apache.org/repos/asf/incubator-ponymail/blob/df0b7ee1/tools/generators.py
>>>> ----------------------------------------------------------------------
>>>> diff --git a/tools/generators.py b/tools/generators.py
>>>> index e320a06..b37fa8b 100644
>>>> --- a/tools/generators.py
>>>> +++ b/tools/generators.py
>>>> @@ -22,6 +22,7 @@ This file contains the various ID generators for Pony 
>>>> Mail's archivers.
>>>>  import hashlib
>>>>  import email.utils
>>>>  import time
>>>> +import re
>>>>
>>>>  # Full generator: uses the entire email (including server-dependent data)
>>>>  # This is the recommended generator for single-node setups.
>>>> @@ -95,8 +96,13 @@ def redundant(msg, body, lid, attachments):
>>>>      if not body: # Make sure body is not None, which will fail.
>>>>          body = ""
>>>>      xbody = body if type(body) is bytes else body.encode('ascii', 
>>>> 'ignore')
>>>> +
>>>> +    # Crop out any trailing whitespace in body
>>>> +    xbody = re.sub(b"\s+$", b"", xbody)
>>>> +
>>>>      # Use List ID
>>>>      xbody += bytes(lid, encoding='ascii')
>>>> +
>>>>      # Use Date header. Don't use archived-at, as the archiver sets this 
>>>> if not present.
>>>>      mdate = None
>>>>      mdatestring = "(null)" # Default to null, ONLY changed if replicable 
>>>> across imports
>>>> @@ -106,14 +112,17 @@ def redundant(msg, body, lid, attachments):
>>>>      except:
>>>>          pass
>>>>      xbody += bytes(mdatestring, encoding='ascii')
>>>> +
>>>>      # Use sender
>>>>      sender = msg.get('from', None)
>>>>      if sender:
>>>>          xbody += bytes(sender, encoding = 'ascii')
>>>> +
>>>>      # Use subject
>>>>      subject = msg.get('subject', None)
>>>>      if subject:
>>>>          xbody += bytes(subject, encoding = 'ascii')
>>>> +
>>>>      # Use attachment hashes if present
>>>>      if attachments:
>>>>          for a in attachments:
>>>>
>>

Reply via email to