[
https://issues.apache.org/jira/browse/XERCESJ-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765481#action_12765481
]
Karl Wright commented on XERCESJ-1398:
--------------------------------------
Further research shows that the problem is not the debian distribution. The
mechanism that prevents RewindableInputStream from allocating memory
indefinitely is apparently supposed to be the "mayReadChunks" flag. However,
this flag is not consulted in the read() method, but only in the bulk read
method read(byte[], int, int). The xerces code therefore *presumes* that the
read() method is not called by whatever Reader class has been instantiated,
which is of course a very fragile assumption.
We supply a custom Reader implementation that permits badly-encoded utf-8 XML
to be parsed, and this is what is triggering the problem in our case.
Multi-byte reads are forced to go through the single-byte pathway in order to
catch and bypass encoding errors. My test case above does not capture this
because we have this lax reader as the default utf-8 reader on the system in
question.
I would still strongly urge that the xerces team look carefully at fixing this
issue, as it may have ramifications on other systems as well.
> Supplying document without content-type headers causes entire stream to be
> buffered in memory, even when using SAX API
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: XERCESJ-1398
> URL: https://issues.apache.org/jira/browse/XERCESJ-1398
> Project: Xerces2-J
> Issue Type: Bug
> Components: SAX
> Affects Versions: 2.9.1
> Environment: Debian Linux, Sun JDK 1.5.0_20
> Reporter: Karl Wright
>
> If the parser needs to autodetect the encoding of the input stream, it wraps
> the input stream using the RewindableInputStream class within
> XMLEntityManager. But this class buffers everything that is read from the
> stream, even after the autodetection is complete (and no possibility of
> rewind being used exists anymore). It is therefore trivial to submit XML to
> xerces2-j which causes an "OutOfMemoryError" exception to be thrown, which
> could lead to a denial of service under appropriate conditions.
> The fix I created for this involved adding a method "stopBuffering()" to the
> RewindableInputStream class, which shuts off further buffering by that class.
> I call this method when the encoding has been decided upon (i.e. right
> before createReader is called, everywhere).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]