On 7 September 2017 at 11:08, Daniel Gruno <[email protected]> wrote:
> On 09/07/2017 12:01 PM, sebb wrote:
>> On 7 September 2017 at 06:06, Daniel Gruno <[email protected]> wrote:
>>> On 09/07/2017 12:24 AM, sebb wrote:
>>>> On 6 September 2017 at 07:32, Daniel Gruno <[email protected]> wrote:
>>>>> On 09/06/2017 12:09 AM, sebb wrote:
>>>>>> On 2 September 2017 at 09:02,  <[email protected]> wrote:
>>>>>>> Repository: incubator-ponymail
>>>>>>> Updated Branches:
>>>>>>>   refs/heads/master c8f4d3b7d -> df0b7ee1c
>>>>>>>
>>>>>>>
>>>>>>> crop out trailing whitespace for redundant archiver
>>>>>>>
>>>>>>> This deals with spurious whitespace that can exist on
>>>>>>> clustered setups due to corrections inside the MTAs.
>>>>>>> This only deals with trailing whitespace, everything else
>>>>>>> is preserved.
>>>>>>
>>>>>> -1
>>>>>>
>>>>>> I don't think this is a good idea.
>>>>>
>>>>> Your -1 is noted, but I don't consider the reasoning valid for a veto,
>>>>> so I'll interpret this as just a plain -1.
>>>>
>>>> AIUI, that's not your call.
>>>
>>> It's not my call to determine whether technical merit is sound (that
>>> would be for the PPMC in such cases), but there has to be technical
>>> merit in -1 in the first place. Saying "this is not a good idea" does
>>> not convey technical reason. You've since elaborated on that in your
>>> reply, and _that_ I believe constitutes a technical reason.
>>
>> Ah ok.
>>
>> I guess I was too terse, I should have linked to the previous mails I
>> sent about the same issue.
>>
>>>>
>>>>> I think it's a good idea, I think it solves some real problems that have
>>>>> been spotted in clustered setup. It could also solve problems where one
>>>>> archives as mbox with an extra newline by mistake. It's also an optional
>>>>> generator, not the default. Could you elaborate on why trailing
>>>>> whitespace would matter?
>>>>
>>>> I already wrote that ignoring whitespace causes a problem because it
>>>> means two different inputs end up with the same database id.
>>>> There's no way of knowing which one was correct; the wrong one may end
>>>> up being stored.
>>>
>>> But they would both have the same sender, date, list, message,
>>> attachments etc filed under the same ID - is that not what we want? What
>>> we _don't_ want is for trailing whitespace to cause duplicates. Put in
>>> other words: Why would we at all care whether one has the added newline
>>> or two and the other one doesn't? We're dealing with showing people
>>> emails, but bit-perfect of what was sent (including duplicates as a
>>> result of bit-diversion), but rather of what was intended.
>>
>> I disagree; I think it's important to show the input email as exactly
>> as possible.
>> Whitespace trimming could damage some emails.
>>
>>> If we wanted
>>> a perfect copy, we'd use the full digest and skip clustered setups all
>>> together, hoping machines don't die on us.
>>
>> Not so, it must be possible to have perfect copies in clustered setups.
>> Otherwise clustered backup systems would be impossible.
>> It's just that the current design may make this tricky.
>>
>>> This is for those rare
>>> occasions where something _does_ go wrong, and as seen, sometimes
>>> postfix will add some extra newlines - I still don't know why it does
>>> that in every case, I only know that it does, and likely other MTAs do
>>> as well.
>>
>> That's largely my point.
>> The cause needs to be determined otherwise the generator is being used
>> to ignore what may be a bug.
>>
>> Besides, in the cases I have seen (and noted on this list), it is not
>> only a difference in trailing whitespace.
>> The archived-at header is missing in one of the copies.
>> As I have written already, that points to non-identical treatment by
>> the different cluster members.
>>
>
> The archived-at, and possibly the extra whitespace, likely stems from a
> postfix oddity (that I really can't fix :p), in that mail delivered
> locally will be handled internally, even if it's supposed to be rerouted
> to a different address than the original.
>
> The case is as follows, I think:
> - 3 nodes act as MTAs
> - Each node will receive an email (whichever node has highest priority
> and is awake) and duplicate it into 3 copies, that each go to all the
> boxes in the MX setup. So, one of these copies go to the box itself, and
> the two other copies go to the other boxes if/when they are online (this
> is to not bounce an email if a box should be down or erroring out). Now,
> since one of the copies go to the box itself (even though it goes to a
> new email address), it is somehow rerouted differently, and that causes
> the header (and possibly the whitespace fixes) to either be there or not.

I thought that the archived-at header was only added by the archiver?
Also the archiver always adds it if it is missing.

How can the postfix setup affect this?

> I'll gladly admit that we're working on figuring this out differently,
> and possibly publishing this as a "what to do and not do" guideline
> later on, when the issue is fixed. I can also accept that if we have
> this guideline, we can omit the whitespace trimming.

We are getting there.

However the trimming is unnecessary for non-clustered setups.
And it's not yet clear that it's necessary for all clustered setups.

Thus the 'fix' should not be in the generator code.

Reply via email to