On 7 September 2017 at 11:08, Daniel Gruno <[email protected]> wrote: > On 09/07/2017 12:01 PM, sebb wrote: >> On 7 September 2017 at 06:06, Daniel Gruno <[email protected]> wrote: >>> On 09/07/2017 12:24 AM, sebb wrote: >>>> On 6 September 2017 at 07:32, Daniel Gruno <[email protected]> wrote: >>>>> On 09/06/2017 12:09 AM, sebb wrote: >>>>>> On 2 September 2017 at 09:02, <[email protected]> wrote: >>>>>>> Repository: incubator-ponymail >>>>>>> Updated Branches: >>>>>>> refs/heads/master c8f4d3b7d -> df0b7ee1c >>>>>>> >>>>>>> >>>>>>> crop out trailing whitespace for redundant archiver >>>>>>> >>>>>>> This deals with spurious whitespace that can exist on >>>>>>> clustered setups due to corrections inside the MTAs. >>>>>>> This only deals with trailing whitespace, everything else >>>>>>> is preserved. >>>>>> >>>>>> -1 >>>>>> >>>>>> I don't think this is a good idea. >>>>> >>>>> Your -1 is noted, but I don't consider the reasoning valid for a veto, >>>>> so I'll interpret this as just a plain -1. >>>> >>>> AIUI, that's not your call. >>> >>> It's not my call to determine whether technical merit is sound (that >>> would be for the PPMC in such cases), but there has to be technical >>> merit in -1 in the first place. Saying "this is not a good idea" does >>> not convey technical reason. You've since elaborated on that in your >>> reply, and _that_ I believe constitutes a technical reason. >> >> Ah ok. >> >> I guess I was too terse, I should have linked to the previous mails I >> sent about the same issue. >> >>>> >>>>> I think it's a good idea, I think it solves some real problems that have >>>>> been spotted in clustered setup. It could also solve problems where one >>>>> archives as mbox with an extra newline by mistake. It's also an optional >>>>> generator, not the default. Could you elaborate on why trailing >>>>> whitespace would matter? >>>> >>>> I already wrote that ignoring whitespace causes a problem because it >>>> means two different inputs end up with the same database id. >>>> There's no way of knowing which one was correct; the wrong one may end >>>> up being stored. >>> >>> But they would both have the same sender, date, list, message, >>> attachments etc filed under the same ID - is that not what we want? What >>> we _don't_ want is for trailing whitespace to cause duplicates. Put in >>> other words: Why would we at all care whether one has the added newline >>> or two and the other one doesn't? We're dealing with showing people >>> emails, but bit-perfect of what was sent (including duplicates as a >>> result of bit-diversion), but rather of what was intended. >> >> I disagree; I think it's important to show the input email as exactly >> as possible. >> Whitespace trimming could damage some emails. >> >>> If we wanted >>> a perfect copy, we'd use the full digest and skip clustered setups all >>> together, hoping machines don't die on us. >> >> Not so, it must be possible to have perfect copies in clustered setups. >> Otherwise clustered backup systems would be impossible. >> It's just that the current design may make this tricky. >> >>> This is for those rare >>> occasions where something _does_ go wrong, and as seen, sometimes >>> postfix will add some extra newlines - I still don't know why it does >>> that in every case, I only know that it does, and likely other MTAs do >>> as well. >> >> That's largely my point. >> The cause needs to be determined otherwise the generator is being used >> to ignore what may be a bug. >> >> Besides, in the cases I have seen (and noted on this list), it is not >> only a difference in trailing whitespace. >> The archived-at header is missing in one of the copies. >> As I have written already, that points to non-identical treatment by >> the different cluster members. >> > > The archived-at, and possibly the extra whitespace, likely stems from a > postfix oddity (that I really can't fix :p), in that mail delivered > locally will be handled internally, even if it's supposed to be rerouted > to a different address than the original. > > The case is as follows, I think: > - 3 nodes act as MTAs > - Each node will receive an email (whichever node has highest priority > and is awake) and duplicate it into 3 copies, that each go to all the > boxes in the MX setup. So, one of these copies go to the box itself, and > the two other copies go to the other boxes if/when they are online (this > is to not bounce an email if a box should be down or erroring out). Now, > since one of the copies go to the box itself (even though it goes to a > new email address), it is somehow rerouted differently, and that causes > the header (and possibly the whitespace fixes) to either be there or not.
I thought that the archived-at header was only added by the archiver? Also the archiver always adds it if it is missing. How can the postfix setup affect this? > I'll gladly admit that we're working on figuring this out differently, > and possibly publishing this as a "what to do and not do" guideline > later on, when the issue is fixed. I can also accept that if we have > this guideline, we can omit the whitespace trimming. We are getting there. However the trimming is unnecessary for non-clustered setups. And it's not yet clear that it's necessary for all clustered setups. Thus the 'fix' should not be in the generator code.
