[jira] [Resolved] (XERCESJ-724) StringBuffer idiom in DeferredDocumentImpl causes large memory usage

Elliotte Rusty Harold (Jira) Fri, 27 Jun 2025 04:17:04 -0700


     [ 
https://issues.apache.org/jira/browse/XERCESJ-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Elliotte Rusty Harold resolved XERCESJ-724.
-------------------------------------------
    Resolution: Fixed

Look like the implementation of this method changed massively long ago.

> StringBuffer idiom in DeferredDocumentImpl causes large memory usage
> --------------------------------------------------------------------
>
>                 Key: XERCESJ-724
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-724
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: DOM (Level 3 Core)
>    Affects Versions: 2.4.0
>         Environment: Operating System: Windows NT/2K
> Platform: PC
>            Reporter: Scott Nygren
>
> We have a 3Meg document that uses over 1.5 Gig of memory to parse and causes 
> an 
> OutOfMemory error on our webserver.  I traced it down to the fact the 
> document 
> has a text node at the beginning that is 16 K and then has 93,000 more text 
> nodes of much shorter length after it.  Each of the text nodes is allocated 
> 16K 
> of memory to store them even though they may only be a few characters.  The 
> document that causes this is too big to include here, but the code below 
> shows 
> the problem in abstract.
> This problem is due to the way the Sun Windows JDK 1.4.1 treats memory 
> between 
> Strings and StringBuffers (likely on other versions but I haven't tested 
> them).  When StringBuffer.toString is called a String is created with access 
> to 
> the StringBuffer's internal char array.  Which in my problem case is 16K.  
> Then 
> when the next StringBuffer method is called that changes the object (like 
> setLength) a new char array is created for the StringBuffer with the full 
> capacity (another 16K).
> public class TestStringBuffer {
>     // run with java -Xms30m -Xmx30m TestStringBuffer
>     /** Main program entry point. */
>     public static void main(String argv[]) {
>         StringBuffer buf = new StringBuffer(10000);
>         String [] ans1 = new String[1000];
>         String [] ans2 = new String[1000];
>         Runtime rt = Runtime.getRuntime();
>         rt.gc();
>         long free1 = rt.freeMemory();
>         // all strings are allocated 10000
>         // uses over 10 Meg to store array
>         for (int i=0; i < ans1.length; i++) {
>             buf.setLength(0);
>             buf.append("a");
>             buf.append("b");
>             ans1[i] = buf.toString();
>         }
>         rt.gc();
>         long free2 = rt.freeMemory();
>         // uses about 60 K to store array
>         for (int i=0; i < ans2.length; i++) {
>             buf.setLength(0);
>             buf.append("a");
>             buf.append("b");
>             ans2[i] = buf.substring(0);
>         }
>         rt.gc();
>         long free3 = rt.freeMemory();
>         System.out.println("Loop 1 used (toString) "+(free1 - free2));
>         System.out.println("Loop 2 used (substring) "+(free2 - free3));
>     }
> }
> I was able to fix my problem by changing 
> org/apache/xerces/dom/DeferredDocumentImpl.getNodeValueString to use 
>            value = fBufferStr.substring(0);
> instead of 
>            value = fBufferStr.toString();
> wherever its referenced.
> Also, org/apache/xerces/parsers/AbstractDOMParser uses the same idiom which 
> may 
> be a problem, but I did not take the time to test it.
> I am also going to submit a bug to Sun to recommend at least saying something 
> in the StringBuffer doc that Strings from toString could be very large.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (XERCESJ-724) StringBuffer idiom in DeferredDocumentImpl causes large memory usage

Reply via email to