Hi Ken,

I've glanced over the code and it doesn't seem necessary to have a 
reference to a StringBuffer on every ChildNode since only one should be 
used at a time. Fortunately the StringBuffer field is marked transient so 
it's easy to remove. If there's benefit to reusing StringBuffers we could 
probably cut the memory usage down by moving this field to 
CoreDocumentImpl and wrapping it in a SoftReference. Methods which 
concatenate text would get the StringBuffer from the Document node.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

Ken Geis <[EMAIL PROTECTED]> wrote on 11/11/2005 03:49:06 AM:

> Earlier this year, I was working at a company where we were working 
> with some large XML documents.  Parsing and transforming a 40M XML 
> document was using up all of the memory we had.  I thought that it 
> would be good to look into how Xerces' footprint could be improved.
> 
> Just the other day, I started writing a memory profiling tool that I 
> had envisioned.  I looked at what is in the DOM objects, and I found 
> that one thing I couldn't justify was
> 
>    StringBuffer fBufferStr;
> 
> defined in org.apache.xerces.dom.ChildNode.  It is documented simply 
> here:
> 
> http://svn.apache.org/viewcvs.cgi?rev=319759&view=rev
> 
> The reference takes up 4 bytes (in a 32-bit JVM) which ends up being 
> about 7% of the footprint of a class like ElementNSImpl or 13% of the 
> footprint of CDATASectionImpl.
> 
> I've found this attribute used only in two places to implement DOM 
> Level 3 functionality, so it seems to me that it punishes everyone who 
> doesn't use that.  I've done a little benchmarking using XMLBench 
> (http://www.sosnoski.com/opensrc/xmlbench/) and found that if I revert 
> the patch, it saves somewhere between 1.7% and 3.4% on memory, mostly 
> around 2.5%.  Not a lot, but a few percent here and there helps.
> 
> It gets more interesting though.  Hanging on to a StringBuffer like 
> this leads to problems that can be illustrated by a pathological case. 
> Imagine an XML file with a 1M text node that's 1000 nodes deep in the 
> tree.  Though this file may only be a little bigger than 1M, the 
> referenced StringBuffers would use a gigabyte of memory of you were to 
> traverse the tree and call getTextContent() at each node.
> 
> I recommend that this change be reverted.  If someone wants to send me 
> some cases that illustrate the performance improvement from reusing the 
> StringBuffer, I would try to implement some compromise between memory 
> and CPU usage.  At the least, these StringBuffers should be held by 
> soft references to keep them from using up all of the memory.
> 
> I found it quite amusing that in running XMLBench, it required 211M of 
> heap in order to benchmark a 3M log file without getting an 
> OutOfMemoryError.  So there are clearly some inefficiencies not only in 
> DOM representation but in parsing.  So I have some other memory issues 
> to deal with, but let's start here.
> 
> 
> Ken Geis
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to