[
https://issues.apache.org/jira/browse/XERCESJ-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Glavassevich resolved XERCESJ-1398.
-------------------------------------------
Resolution: Cannot Reproduce
Sorry, I cannot reproduce what you're seeing.
Xerces has no problem reading a 1.8 GB document:
java sax.Counter file:///D:/xmldocs/bigFile.xml
file:///D:/xmldocs/bigFile.xml: 94640 ms (100010001 elems, 0 attrs, 0 spaces,
877800000 chars)
and also has no issues with a 7.8 GB document:
java sax.Counter file:///D:/xmldocs/bigFile2.xml
file:///D:/xmldocs/bigFile2.xml: 1041968 ms (400020001 elems, 0 attrs, 0
spaces, 3955600000 chars)
This last one is far larger than a normal heap and I'm sure that other users
have successfully read documents this big (e.g. an XML dump from Wikipedia).
RewindableInputStream stops buffering very early in the document.
I have a suspicion that the code that you were using and produced the patch
from isn't the Apache codebase. "revision 101962" doesn't correspond to any
version of XMLEntityManager in Apache SVN. In fact the first SVN rev was
317483 and as of today is 822684.
> Supplying document without content-type headers causes entire stream to be
> buffered in memory, even when using SAX API
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: XERCESJ-1398
> URL: https://issues.apache.org/jira/browse/XERCESJ-1398
> Project: Xerces2-J
> Issue Type: Bug
> Components: SAX
> Affects Versions: 2.9.1
> Environment: Debian Linux, Sun JDK 1.5.0_20
> Reporter: Karl Wright
>
> If the parser needs to autodetect the encoding of the input stream, it wraps
> the input stream using the RewindableInputStream class within
> XMLEntityManager. But this class buffers everything that is read from the
> stream, even after the autodetection is complete (and no possibility of
> rewind being used exists anymore). It is therefore trivial to submit XML to
> xerces2-j which causes an "OutOfMemoryError" exception to be thrown, which
> could lead to a denial of service under appropriate conditions.
> The fix I created for this involved adding a method "stopBuffering()" to the
> RewindableInputStream class, which shuts off further buffering by that class.
> I call this method when the encoding has been decided upon (i.e. right
> before createReader is called, everywhere).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]