Wow!!! That puts my problem on another level.

First, As far as I know, recycled nodes are NOT reused for imported
nodes. In my code

I DO NOT need that many nodes at all, and I do release and rebuild nodes
all the time. There is no real reason for my document to bloat, except
if it does not reuse released nodes.

The real reason for all of this is that I use a code-generator by Altova
(makers of XMLSpy) that created a full C++ API from my complicated XML
schema. It creates named C++ objects around important XML tags and
schema types, and maintains the XML layer (using xerces or MSXML)
underneath, without the client ever needing to deal with XML directly.
We changed that code-generator a lot, but basically, it still works this
way:

There is just ONE GLOBAL DOMDocument per generated library which holds
all the created nodes and sub-trees. I was not, of course, involved in
the design and development of that code-generator, but I guess they
thought that in-document manipulations were lighter than inter-document
manipulations. So instead of keeping many documents, and maintaining
their cached grammar and schemas they simply maintain lots of sub-trees,
that are most of the time not connected to any parent --- the document
is just a container in which sub-trees float. 

They only make a sub-tree node a root-element of the global DOMDocument
when they want to serialize it, or validate it against schema.

When does it all break??? When you need to parse a new file/memory
buffer. xerces parses into its own new document, and they need to add it
to the special global document. They actually import all of the parsed
doc into the global doc as a new subtree, Then
builder->resetDocumentPool to remove the original.

My problem is that routinely releasing a subtree, parsing a new one, and
importing it into the global document makes my program to bloat.

My alternative is to completely rewrite the code-generator to use
independent documents for each object, which is a lot of work.

Any ideas? 


Motti Shneor
Software Engineer
Orbograph Ltd.
P.O.Box 215, Yavne 81102, Israel
Tel: 972-8-9322257 ext. 230
Fax: 972-8-9328857
[EMAIL PROTECTED]
http://www.orbograph.com 
 

-----Original Message-----
From: Alberto Massari [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 12, 2006 1:13 PM
To: [email protected]
Cc: Ziv Tsoref
Subject: RE: DOMDocument memory bloating problem

Hi Motti,

At 12.11 12/12/2006 +0200, Motti Shneor wrote:
>Hello Alberto, and thanks a lot for the enlightening answer. However, I
>need few more clarifications.
>
>1. Debugging through the DOMNode->remove()->release() process, I have
>seen these strings being pushed into a "recycled" container. Why does
>the code bother to do that, if the "pages" as you call them are never
>actually cleaned?

Because they are recycled the next time a new node is created.


>2. I have noticed, too, that on some occasions xerces DOES reuse
>released nodes (I got the same pointers again and again when creating
>elements, attributes etc.) What is the rule here? I suspect that
>repeated doc->importNode() is the call that bloats my DOMDocument. But
I
>have no proof...

Calling importNode is the way you can copy DOMNodes from a 
DOMDocument to another (all the DOMNodes in a DOM tree must come from 
the same memory pool owned by the DOMDocument at the root); it will 
end up creating copies of the source nodes, recycling released nodes 
if they are available.


>3. If DOMDocument does not keep track of the cleaned "pages" (Are they
>the "buckets" in the code?) can I add a cleanup function to
>DOMDocumentImpl.cpp/hpp to EXPLICITELY scan and release such "pages" ?
>Can you hint on the implications? I don't need to do it very often so
>such function can be (for my purpose) inefficient, but I absolutely
need
>to do this at times.

The implication is that you should track all the 
allocations/deallocations made by DOMDocument from each page, and 
that would slow down the entire program, not just the cleanup phase. 
The alternative approach (scanning the entire tree to check where the 
nodes are pointing to is both inefficient and prone to errors, as 
some pointers could be held by arrays or maps, many levels down).
So, you are left with two choices:
1) redesign your code to avoid allocating/deallocating many nodes 
(why do you need to call importNode so many times? would a brand new 
DOMDocument that is deleted at the end of the processing do the same
work?)
2) change the code of the DOMDocument memory manager to track all the 
memory pieces (e.g. on Windows, you could use a private heap with 
HeapCreate/HeapAlloc/HeapFree/HeapDestroy)

Hope this helps,
Alberto


>Thanks a lot -
>Motti Shneor
>
>-----Original Message-----
>From: Alberto Massari [mailto:[EMAIL PROTECTED]
>Sent: Tuesday, December 12, 2006 10:15 AM
>To: [email protected]
>Subject: Re: DOMDocument memory bloating problem
>
>Hi Motti,
>unfortunately there is no such control on the memory allocated by
>DOMDocument: all of the nodes and strings used in the DOM tree come
>from a memory pool allocated by the DOMDocument, and they can be
>freed only by deleting the entire page (and DOMDocument doesn't keep
>statistics to check whether an entire page contains only released
>nodes). So the only way to release the memory is by releasing the
>entire DOMDocument.
>
>Sorry if this is not the answer you would have liked,
>Alberto
>
>At 09.36 12/12/2006 +0200, Motti Shneor wrote:
> >Hello everyone. Happy to join the list.
> >
> >I use a system that reuses the same xerces::DomDocument for long
>period,
> >adding and releasing DomNodes (elements, attributes etc.)
continuously.
> >
> >Although I DomNode->remove()->release() every unneeded node, the
memory
> >taken up by DomDocument seems to ever increase, to the point the
>program
> >becomes unusable.
> >
> >In the docs, it is recommended that I release unused nodes, but it
only
> >is assured that they are actually released when the document is
> >released. This is not good enough in my situation.
> >
> >I see that xerces memory manager's "deallocate()" is never called on
my
> >nodes until I explicitly DomDocument *myDoc->release();
> >
> >
> >I am seeking a way to instruct a DomDocument to actually clear and
free
> >its RELEASED nodes. Something like a partial DomDocoment->release()
>that
> >will only clean up its heap from released stuff.
> >
> >Is it possible? Is there a simple way to do this? What are the
prices?
> >
> >Any ideas?
> >
> >
> >Motti Shneor
> >Senior Software Engineer
> >Orbograph Ltd.
> >[EMAIL PROTECTED]
> >http://www.orbograph.com
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to