MimeBodyPartInputStream illegally returns "0" from a read call with chunked
InputStream
---------------------------------------------------------------------------------------
Key: CXF-3068
URL: https://issues.apache.org/jira/browse/CXF-3068
Project: CXF
Issue Type: Bug
Components: Core
Affects Versions: 2.2.10, 2.2.9, 2.2.8, 2.2.7, 2.2.6, 2.2.5, 2.2.4, 2.2.3
Environment: Windows
Reporter: aaron pieper
I'm having a problem with some MTOM attachments. It started when I upgraded
from CXF 2.2.2 to CXF 2.2.3. The bug is that after calling a service which
returned an MTOM attachment, when I try to parse the attachment, I sometimes
get an error:
java.io.IOException: Underlying input stream returned zero bytes
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:268)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
at sun.nio.cs.StreamDecoder.READ(StreamDecoder.java:158)
at java.io.InputStreamReader.READ(InputStreamReader.java:167)
at java.io.Reader.READ(Reader.java:123)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1128)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:1104)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:1050)
at org.apache.commons.io.IOUtils.toString(IOUtils.java:359)
at com.pragmatics.AsyncUtils.messageToString(AsyncUtils.java:18)
The error only happens for some attachments - about 25% of them. It's a
seemingly arbitrary 25% - it's not like, the biggest 25% or the ones that have
special characters. I was able to track this down to MimeBodyPartInputStream.
MimeBodypartInputStream has some logic in processBuffer for reading the
boundary. It goes like this:
while ((boundaryIndex < boundary.length) && (value == boundary[boundaryIndex]))
{ if (!hasData(buffer, initialI, i + 1, off, len)) {
return initialI - off;
}
value = buffer[++i];
boundaryIndex++;
}
So, basically, when MimeBodyPartInputStream finds the start of a boundary, it
reads from the stream until either there's no more characters to read, or until
it read the entire boundary. The problem with this logic is that it assumes the
entire boundary will be read in the same call to the underlying InputStream.
This assumption isn't always true. Specifically, when I'm fetching an
attachment in my application, this MimeBodyPartInputStream is backed by an
HttpURLConnection.HttpInputStream. This HttpInputStream sometimes fetches as
few as 24 characters, I guess that's just how the HttpInputStream works. But if
these 24 characters happen to fall on one of these MIME boundaries, it can
cause problems.
One problem, which I'm running into here, is that the MimeBodyPartInputStream's
read(byte,int,int) method returns 0, since the only bytes that were read were
parts of the MIME boundary. In returning 0, it breaks InputStream's contract
which says states that the read method will only ever return a positive integer
(if some bytes were read) or -1 (if no bytes were read.) There are probably
other possible problems - it seems like it's possible MimeBodyPartInputStream
might misunderstand whether or not it's hit a boundary in some cases. I haven't
run into that problem though.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.