[ https://issues.apache.org/jira/browse/MIME4J-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884479#comment-17884479 ]
René Cordier commented on MIME4J-330: ------------------------------------- >From Markus Wiederkehr: A quick bisect seems to indicate that the regression was introduced in commit 7e23d5e1cc2bd77f3e5129622472b995cd3f034e : MIME4J-316 Parts missing in case of a specific combination of boundaries --- a/core/src/main/java/org/apache/james/mime4j/io/[MimeBoundaryInputStream.java|https://mimeboundaryinputstream.java/] +++ b/core/src/main/java/org/apache/james/mime4j/io/[MimeBoundaryInputStream.java|https://mimeboundaryinputstream.java/] @@ -244,11 +244,14 @@ public class MimeBoundaryInputStream extends LineReaderInputStream { // Make sure the boundary is terminated with EOS break; } else { - // or with a whitespace or '-' char + // or with a whitespace or '--' char ch = (char)(buffer.byteAt(pos)); - if (CharsetUtil.isWhitespace(ch) || ch == '-') { + if (CharsetUtil.isWhitespace(ch)) { break; } + if (ch == '-' && remaining > 1 && (char)(buffer.byteAt(pos+1)) == '-') { + break; + } > Regression in MimeStreamParser: part body stream ends with CR > ------------------------------------------------------------- > > Key: MIME4J-330 > URL: https://issues.apache.org/jira/browse/MIME4J-330 > Project: James Mime4j > Issue Type: Bug > Reporter: René Cordier > Priority: Major > > A member of the community, Madis Loitmaa (madis.loit...@gmail.com) reported > on the ML to have spotted a regression in the MimeStreamParser: part body > stream ends with CR. > Specifically, when processing messages with certain body lengths, the > CR character (from the CRLF sequence preceding the boundary marker) is > incorrectly included as the last character of the part body. > Environment: > This issue was identified after upgrading Mime4j in our project from > version 0.8.7 to 0.8.11. It appears to affect all versions starting > from 0.8.8. > Reproduction: > - Madis has attached a unit test, which demonstrates the > problem. The test fails for a part body length of 4051 bytes on the > current master branch (commit > 85995590ad6700cc8bf7a3b8462ce87843dab5bd), but passes when tested with > version 0.8.7 (commit ed5a50c8071080b4eaedd6ab13baf25843d691a3). > - The bug appears when CRLF is used as the line separator. The issue > does not occur when LF is used. > > Here is the unit test: > {code:java} > // file src/test/java/org/apache/james/mime4j/parser/PartLengthTest.java > package org.apache.james.mime4j.parser; > import org.apache.commons.io.IOUtils; > import org.apache.james.mime4j.MimeException; > import org.apache.james.mime4j.stream.BodyDescriptor; > import org.junit.Assert; > import org.junit.Test; > import java.io.ByteArrayInputStream; > import java.io.IOException; > import java.io.InputStream; > import java.nio.charset.StandardCharsets; > public class PartLengthTest { > @Test > public void testExtractPartWithDifferentLengths() throws Exception { > StringBuilder partBuilder = new StringBuilder(); > for (int i = 1; i <= 5000; i++) { > partBuilder.append(i % 80 == 0 ? "\n" : "A"); > String part = partBuilder.toString(); > String mimeMessage = createMimeMultipart(part); > String extracted = extractPart(mimeMessage); > if (!part.equals(extracted)) { > System.out.println("Extracted part comparison failed > for part length " + i); > } > Assert.assertEquals(part, extracted); > } > } > private String createMimeMultipart(String part) { > return "Content-type: multipart/mixed; boundary=QvEgqhjEnYxz\r\n" > + "\r\n" > + "--QvEgqhjEnYxz\r\n" > + "Content-Type: text/plain\r\n" > + "\r\n" > + part > + "\r\n" > + "--QvEgqhjEnYxz--\r\n"; > } > private String extractPart(String mimeMessage) throws > MimeException, IOException { > String[] resultWrapper = new String[1]; > MimeStreamParser parser = new MimeStreamParser(); > parser.setContentHandler(new AbstractContentHandler() { > @Override > public void body(BodyDescriptor bd, InputStream is) throws > MimeException, IOException { > resultWrapper[0] = new String(IOUtils.toString(is, > StandardCharsets.UTF_8).getBytes()); > } > }); > parser.parse(new ByteArrayInputStream(mimeMessage.getBytes())); > return resultWrapper[0]; > } > } > //end-of-file{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)