I've created a PR https://github.com/mikel/mail/pull/1319
On Tue, 12 Feb 2019 at 11:30, sebb <[email protected]> wrote: > > This is where the body parsing fails: > > https://github.com/mikel/mail/blob/7c43c84c16f017e0ff5e5c9962f6a1d842301ee3/lib/mail/body.rb#L298 > > If the \r\n is changed to \r?\n then it works for me. > > Note that the body is converted to CRLF earlier: > > https://github.com/mikel/mail/blob/7c43c84c16f017e0ff5e5c9962f6a1d842301ee3/lib/mail/body.rb#L44 > > However this does not do anything unless the original mail is ascii-only. > > On Tue, 12 Feb 2019 at 08:20, sebb <[email protected]> wrote: > > > > On Tue, 12 Feb 2019 at 01:34, Sam Ruby <[email protected]> wrote: > > > > > > On Mon, Feb 11, 2019 at 5:12 PM sebb <[email protected]> wrote: > > > > > > > > On Mon, 11 Feb 2019 at 21:44, Sam Ruby <[email protected]> wrote: > > > > > > > > > > That is indeed a big in the mail gem. I contributed a patch that is a > > > > > partial fix. > > > > > > > > > > What is essential is that you only convert headers not the binary > > > > > attachment as that can corrupt images. > > > > > > > > AFAICT headers are already handled correctly even if they are > > > > LF-terminated. > > > > I assume this is because parsing looks for CR and/or LF - there's > > > > certainly some code that does this. > > > > > > > > This does not seem to be the case for body parsing. > > > > From what little I have seen, it looks as though the code tries to > > > > convert to CRLF before continuing to parse. > > > > > > > > If the body is not ascii_only then the method Mail::Utilities.to_crlf > > > > does not convert anything. > > > > I'm pretty sure that is what is happening with this email. > > > > > > > > Surely binary images have to be base64-encoded or similar before they > > > > can be sent in an email? > > > > > > > > AFAIK, CR and LF are not allowed in mail data - or have I got that > > > > wrong? > > > > > > For starters, if everybody correctly created emails according to the > > > spec, there would not be a need to provide any fixups. This also > > > explains why Craig resending the mail (with a proper email client that > > > respects the relevant RFCs) works around the problem. > > > > > > Older versions of the mail gem used to do a global conversion of LF > > > which was not preceded by a CR to CRLF (CRLF is required by the spec, > > > but again, that part of the spec is routinely ignored).. In response > > > to a bug report where that code was corrupting attachments, they > > > "fixed" the code to no longer do that except for the headers, but only > > > if the headers were pure ASCII. I provided a patch which expanded it > > > to headers that are correctly encoded, and that patch was accepted. > > > > > > Clearly this is not enough. > > > > > > What probably should be done is to split the file at the first > > > \r?\n\r?\n (a blank line separates the headers from the body). Then > > > do a global change of LFs which are not preceded by a CR to a CRLF - > > > but only in the headers. Then reattach the header and body and parse > > > that. > > > > That is exactly what was previously done, but it did not work, because > > the problem is in parsing the body, not the headers. > > > > The headers parse fine even if LF-only terminated. > > > > Also, whilst CRLF is required by the spec, AFAICT all the mails that > > are processed by deliver.rb have LF-only. > > Just have a look at some of the members/board mail archives on Whimsy > > - these don't have any LF-processing added by tools/deliver.rb > > Ditto for mbox files on minotaur. > > Perhaps this is because CRLF is mandated for SMTP, but once the mail > > has been received, it's no longer a requirement? > > > > What I think needs to happen is for the mail gem to treat both CRLF > > and LF as EOL (possibly also CR only) when parsing the body. > > It already does this for the headers (as far as I can tell). > > There should be no need to 'fix' the input at all if it uses the > > appropriate EOL markers. > > However on output, of course it should always generate CRLF. > > > > > > S. > > > > > > - Sam Ruby > > > > > > > > On Mon, Feb 11, 2019, 4:11 PM sebb <[email protected] wrote: > > > > > > > > > > > It looks as though the mail gem does try to convert the input to > > > > > > CRLF > > > > > > before parsing. > > > > > > However, it only does this if the input is ASCII-only (*). > > > > > > > > > > > > Whilst this is the best approach, I think we can ignore that check > > > > > > and > > > > > > just convert to CRLF. > > > > > > [It may cause a very occasional glitch, but any such should be > > > > > > obvious.] > > > > > > > > > > > > It explains why the original email failed to parse: there are > > > > > > several > > > > > > non-ASCII chars in it. > > > > > > These have been converted to quoted-printable by the version you > > > > > > forwarded, thus allowing the parse to complete OK. > > > > > > > > > > > > I'll commit a fix shortly. > > > > > > > > > > > > S. > > > > > > (*) This applies to binary data, which is all that we know about the > > > > > > input at this point. > > > > > > > > > > > > On Mon, 11 Feb 2019 at 18:55, sebb <[email protected]> wrote: > > > > > > > > > > > > > > Seems to be an issue with the parser not handling certain messages > > > > > > > which have LF-only line terminators. > > > > > > > > > > > > > > I tried converting the mail file to CRLF and reparsing, and the > > > > > > > attachment appeared. > > > > > > > > > > > > > > AFAICT all messages are being stored as LF-only, and I don't know > > > > > > > why > > > > > > > this only affects some mails. > > > > > > > The one in question is quite complicated with various quoted mails > > > > > > > before the attachment; that might have affected parsing. > > > > > > > > > > > > > > The copy of the mail sent to my GMail account appears to have CRLF > > > > > > > line terminators, yet the copies on minotaur and mbox-vm are > > > > > > > LF-only. > > > > > > > Perhaps GMail auto-converts the mail? > > > > > > > Not sure what is happening here. > > > > > > > > > > > > > > On Mon, 11 Feb 2019 at 17:20, sebb <[email protected]> wrote: > > > > > > > > > > > > > > > > The immediate reason is that the parsed message does not have > > > > > > > > any > > > > > > > > attachments listed in the summary yml file, so does not show up > > > > > > > > in the > > > > > > > > workbench. > > > > > > > > > > > > > > > > I will look further into why the parsing did not see the > > > > > > > > attachment. > > > > > > > > > > > > > > > > S. > > > > > > > > On Mon, 11 Feb 2019 at 17:11, sebb <[email protected]> wrote: > > > > > > > > > > > > > > > > > > I'll take a look > > > > > > > > > > > > > > > > > > On Mon, 11 Feb 2019 at 16:27, Craig Russell > > > > > > > > > <[email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > This seems to be happening more frequently: an ICLA shows > > > > > > > > > > up in > > > > > > secretary mail inbox but does not appear in workbench. > > > > > > > > > > > > > > > > > > > > If I forward the message to secretary, it then appears in > > > > > > workbench. > > > > > > > > > > > > > > > > > > > > Without violating PII, I cannot forward the message > > > > > > > > > > directly. But > > > > > > this is the message that was received this AM and never made it to > > > > > > workbench: > > > > > > > > > > > > > > > > > > > > From: Olivier Coutu > > > > > > > > > > Date: 6:10 AM PST > > > > > > > > > > Subject: Fwd: Re: Patch to reduce lint time > > > > > > > > > > > > > > > > > > > > Any ideas? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Craig > > > > > > > > > > > > > > > > > > > > Craig L Russell > > > > > > > > > > Secretary, Apache Software Foundation > > > > > > > > > > [email protected] http://db.apache.org/jdo > > > > > > > > > > > > > > > >
