[
https://issues.apache.org/jira/browse/XERCESJ-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Elliotte Rusty Harold resolved XERCESJ-724.
-------------------------------------------
Resolution: Fixed
Look like the implementation of this method changed massively long ago.
> StringBuffer idiom in DeferredDocumentImpl causes large memory usage
> --------------------------------------------------------------------
>
> Key: XERCESJ-724
> URL: https://issues.apache.org/jira/browse/XERCESJ-724
> Project: Xerces2-J
> Issue Type: Bug
> Components: DOM (Level 3 Core)
> Affects Versions: 2.4.0
> Environment: Operating System: Windows NT/2K
> Platform: PC
> Reporter: Scott Nygren
>
> We have a 3Meg document that uses over 1.5 Gig of memory to parse and causes
> an
> OutOfMemory error on our webserver. I traced it down to the fact the
> document
> has a text node at the beginning that is 16 K and then has 93,000 more text
> nodes of much shorter length after it. Each of the text nodes is allocated
> 16K
> of memory to store them even though they may only be a few characters. The
> document that causes this is too big to include here, but the code below
> shows
> the problem in abstract.
> This problem is due to the way the Sun Windows JDK 1.4.1 treats memory
> between
> Strings and StringBuffers (likely on other versions but I haven't tested
> them). When StringBuffer.toString is called a String is created with access
> to
> the StringBuffer's internal char array. Which in my problem case is 16K.
> Then
> when the next StringBuffer method is called that changes the object (like
> setLength) a new char array is created for the StringBuffer with the full
> capacity (another 16K).
> public class TestStringBuffer {
> // run with java -Xms30m -Xmx30m TestStringBuffer
> /** Main program entry point. */
> public static void main(String argv[]) {
> StringBuffer buf = new StringBuffer(10000);
> String [] ans1 = new String[1000];
> String [] ans2 = new String[1000];
> Runtime rt = Runtime.getRuntime();
> rt.gc();
> long free1 = rt.freeMemory();
> // all strings are allocated 10000
> // uses over 10 Meg to store array
> for (int i=0; i < ans1.length; i++) {
> buf.setLength(0);
> buf.append("a");
> buf.append("b");
> ans1[i] = buf.toString();
> }
> rt.gc();
> long free2 = rt.freeMemory();
> // uses about 60 K to store array
> for (int i=0; i < ans2.length; i++) {
> buf.setLength(0);
> buf.append("a");
> buf.append("b");
> ans2[i] = buf.substring(0);
> }
> rt.gc();
> long free3 = rt.freeMemory();
> System.out.println("Loop 1 used (toString) "+(free1 - free2));
> System.out.println("Loop 2 used (substring) "+(free2 - free3));
> }
> }
> I was able to fix my problem by changing
> org/apache/xerces/dom/DeferredDocumentImpl.getNodeValueString to use
> value = fBufferStr.substring(0);
> instead of
> value = fBufferStr.toString();
> wherever its referenced.
> Also, org/apache/xerces/parsers/AbstractDOMParser uses the same idiom which
> may
> be a problem, but I did not take the time to test it.
> I am also going to submit a bug to Sun to recommend at least saying something
> in the StringBuffer doc that Strings from toString could be very large.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]