Hi, A quick bisect seems to indicate that the regression was introduced in commit 7e23d5e1cc2bd77f3e5129622472b995cd3f034e : MIME4J-316 Parts missing in case of a specific combination of boundaries
--- a/core/src/main/java/org/apache/james/mime4j/io/MimeBoundaryInputStream.java +++ b/core/src/main/java/org/apache/james/mime4j/io/MimeBoundaryInputStream.java @@ -244,11 +244,14 @@ public class MimeBoundaryInputStream extends LineReaderInputStream { // Make sure the boundary is terminated with EOS break; } else { - // or with a whitespace or '-' char + // or with a whitespace or '--' char ch = (char)(buffer.byteAt(pos)); - if (CharsetUtil.isWhitespace(ch) || ch == '-') { + if (CharsetUtil.isWhitespace(ch)) { break; } + if (ch == '-' && remaining > 1 && (char)(buffer.byteAt(pos+1)) == '-') { + break; + } Regards, Markus On Tue, Sep 24, 2024 at 10:32 AM Rene Cordier <rcord...@apache.org> wrote: > Hi Madis, > > Sorry for the delay on the answer. > > I created a task on the JIRA regarding your report: > https://issues.apache.org/jira/browse/MIME4J-330 > > Do you feel like perhaps trying contributing a fix for this by the way? > If not we might take a look, I just can't guarantee when though. > > Thanks again for the report! > > Rene. > > On 9/17/24 2:28 PM, Madis Loitmaa wrote: > > Hello, > > Looks like the attached unit test got lost in transit, it's present in > > my sent folder. I'll try to inline it at the end of the email this > > time. > > > > Best regards, > > Madis Loitmaa > > > > // file src/test/java/org/apache/james/mime4j/parser/PartLengthTest.java > > package org.apache.james.mime4j.parser; > > > > import org.apache.commons.io.IOUtils; > > import org.apache.james.mime4j.MimeException; > > import org.apache.james.mime4j.stream.BodyDescriptor; > > import org.junit.Assert; > > import org.junit.Test; > > > > import java.io.ByteArrayInputStream; > > import java.io.IOException; > > import java.io.InputStream; > > import java.nio.charset.StandardCharsets; > > > > public class PartLengthTest { > > @Test > > public void testExtractPartWithDifferentLengths() throws Exception { > > StringBuilder partBuilder = new StringBuilder(); > > for (int i = 1; i <= 5000; i++) { > > partBuilder.append(i % 80 == 0 ? "\n" : "A"); > > String part = partBuilder.toString(); > > String mimeMessage = createMimeMultipart(part); > > > > String extracted = extractPart(mimeMessage); > > > > if (!part.equals(extracted)) { > > System.out.println("Extracted part comparison failed > > for part length " + i); > > } > > Assert.assertEquals(part, extracted); > > } > > } > > > > private String createMimeMultipart(String part) { > > return "Content-type: multipart/mixed; > boundary=QvEgqhjEnYxz\r\n" > > + "\r\n" > > + "--QvEgqhjEnYxz\r\n" > > + "Content-Type: text/plain\r\n" > > + "\r\n" > > + part > > + "\r\n" > > + "--QvEgqhjEnYxz--\r\n"; > > } > > > > private String extractPart(String mimeMessage) throws > > MimeException, IOException { > > String[] resultWrapper = new String[1]; > > > > MimeStreamParser parser = new MimeStreamParser(); > > parser.setContentHandler(new AbstractContentHandler() { > > @Override > > public void body(BodyDescriptor bd, InputStream is) throws > > MimeException, IOException { > > resultWrapper[0] = new String(IOUtils.toString(is, > > StandardCharsets.UTF_8).getBytes()); > > } > > }); > > parser.parse(new ByteArrayInputStream(mimeMessage.getBytes())); > > return resultWrapper[0]; > > } > > } > > //end-of-file > > > > > > Kontakt Rene Cordier (<rcord...@apache.org>) kirjutas kuupäeval T, 17. > > september 2024 kell 05:55: > >> Hello Madis, > >> > >> First of all thank you for your report. If there is indeed a regression > >> we should take a look and fix it. > >> > >> However, I don't see any file attached to your email? Maybe you forgot. > >> Could you please try to resend your unit test attached to your mail > >> response please? That would help the community to check, understand and > >> take proper actions for a fix. > >> > >> Thank you and best regards, > >> > >> Rene. > >> > >> On 9/11/24 1:32 PM, Madis Loitmaa wrote: > >>> Hello Mime4j Developers, > >>> > >>> I am reporting a regression in the MimeStreamParser when extracting > >>> parts of a multipart message. > >>> > >>> Specifically, when processing messages with certain body lengths, the > >>> CR character (from the CRLF sequence preceding the boundary marker) is > >>> incorrectly included as the last character of the part body. > >>> > >>> Environment: > >>> This issue was identified after upgrading Mime4j in our project from > >>> version 0.8.7 to 0.8.11. It appears to affect all versions starting > >>> from 0.8.8. > >>> > >>> Reproduction: > >>> - I have attached a unit test to this email, which demonstrates the > >>> problem. The test fails for a part body length of 4051 bytes on the > >>> current master branch (commit > >>> 85995590ad6700cc8bf7a3b8462ce87843dab5bd), but passes when tested with > >>> version 0.8.7 (commit ed5a50c8071080b4eaedd6ab13baf25843d691a3). > >>> > >>> - The bug appears when CRLF is used as the line separator. The issue > >>> does not occur when LF is used. > >>> > >>> Attachments: > >>> A unit test demonstrating the issue. > >>> src/test/java/org/apache/james/mime4j/parser/PartLengthTest.java > >>> > >>> Please let me know if you need any further information or > clarification. > >>> > >>> Best regards, > >>> Madis Loitmaa >