At the same time I remember that the serialization code stayed broken
for months without anybody noticing, so I wouldn't assume that this
serialization compatibility has to be given a higher priority than
memory footprint which benefits *everybody*.
Michael Glavassevich wrote:
There are applications which serialize Xerces' DOM using Java's object
serialization services which rely on these classes being compatible from
release to release. Aside from moving around and removing transient
fields, it will be difficult to trim the size of the DOM implementation
without breaking serialization compatibility. Probably seemed like a good
idea at the time but making all the classes implement java.io.Serializable
has significantly reduced our ability to make structural changes.
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]
Arnaud Le Hors <[EMAIL PROTECTED]> wrote on 11/11/2005 05:35:46 PM:
Hi Ken,
I agree with you. I'm not sure what the motivation was to want to "reuse
StringBuffer" but it was a bad call. I suppose it was done to gain speed
but with today's JVMs it is not even clear that this is the right thing
to do for that matter either. In any case ChildNode being one of the
core classes of this DOM implementation anything affecting its size has
a major impact on the footprint. I have spent a lot of time in the past
triming down the size of those classes to improve the memory footprint.
I'm glad someone else is looking into it. Maybe with better tools and
different eyes you can squeeze some more!
Have fun.
--
Arnaud Le Hors - Program Director, Corporate Standards, IBM
Ken Geis wrote:
Earlier this year, I was working at a company where we were working
with some large XML documents. Parsing and transforming a 40M XML
document was using up all of the memory we had. I thought that it
would be good to look into how Xerces' footprint could be improved.
Just the other day, I started writing a memory profiling tool that I
had envisioned. I looked at what is in the DOM objects, and I found
that one thing I couldn't justify was
StringBuffer fBufferStr;
defined in org.apache.xerces.dom.ChildNode. It is documented simply
here:
http://svn.apache.org/viewcvs.cgi?rev=319759&view=rev
The reference takes up 4 bytes (in a 32-bit JVM) which ends up being
about 7% of the footprint of a class like ElementNSImpl or 13% of the
footprint of CDATASectionImpl.
I've found this attribute used only in two places to implement DOM
Level 3 functionality, so it seems to me that it punishes everyone who
doesn't use that. I've done a little benchmarking using XMLBench
(http://www.sosnoski.com/opensrc/xmlbench/) and found that if I revert
the patch, it saves somewhere between 1.7% and 3.4% on memory, mostly
around 2.5%. Not a lot, but a few percent here and there helps.
It gets more interesting though. Hanging on to a StringBuffer like
this leads to problems that can be illustrated by a pathological
case. Imagine an XML file with a 1M text node that's 1000 nodes deep
in the tree. Though this file may only be a little bigger than 1M,
the referenced StringBuffers would use a gigabyte of memory of you
were to traverse the tree and call getTextContent() at each node.
I recommend that this change be reverted. If someone wants to send me
some cases that illustrate the performance improvement from reusing
the StringBuffer, I would try to implement some compromise between
memory and CPU usage. At the least, these StringBuffers should be
held by soft references to keep them from using up all of the memory.
I found it quite amusing that in running XMLBench, it required 211M of
heap in order to benchmark a 3M log file without getting an
OutOfMemoryError. So there are clearly some inefficiencies not only
in DOM representation but in parsing. So I have some other memory
issues to deal with, but let's start here.
Ken Geis
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Arnaud
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]