Wow!!! That puts my problem on another level. First, As far as I know, recycled nodes are NOT reused for imported nodes. In my code
I DO NOT need that many nodes at all, and I do release and rebuild nodes all the time. There is no real reason for my document to bloat, except if it does not reuse released nodes. The real reason for all of this is that I use a code-generator by Altova (makers of XMLSpy) that created a full C++ API from my complicated XML schema. It creates named C++ objects around important XML tags and schema types, and maintains the XML layer (using xerces or MSXML) underneath, without the client ever needing to deal with XML directly. We changed that code-generator a lot, but basically, it still works this way: There is just ONE GLOBAL DOMDocument per generated library which holds all the created nodes and sub-trees. I was not, of course, involved in the design and development of that code-generator, but I guess they thought that in-document manipulations were lighter than inter-document manipulations. So instead of keeping many documents, and maintaining their cached grammar and schemas they simply maintain lots of sub-trees, that are most of the time not connected to any parent --- the document is just a container in which sub-trees float. They only make a sub-tree node a root-element of the global DOMDocument when they want to serialize it, or validate it against schema. When does it all break??? When you need to parse a new file/memory buffer. xerces parses into its own new document, and they need to add it to the special global document. They actually import all of the parsed doc into the global doc as a new subtree, Then builder->resetDocumentPool to remove the original. My problem is that routinely releasing a subtree, parsing a new one, and importing it into the global document makes my program to bloat. My alternative is to completely rewrite the code-generator to use independent documents for each object, which is a lot of work. Any ideas? Motti Shneor Software Engineer Orbograph Ltd. P.O.Box 215, Yavne 81102, Israel Tel: 972-8-9322257 ext. 230 Fax: 972-8-9328857 [EMAIL PROTECTED] http://www.orbograph.com -----Original Message----- From: Alberto Massari [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 12, 2006 1:13 PM To: [email protected] Cc: Ziv Tsoref Subject: RE: DOMDocument memory bloating problem Hi Motti, At 12.11 12/12/2006 +0200, Motti Shneor wrote: >Hello Alberto, and thanks a lot for the enlightening answer. However, I >need few more clarifications. > >1. Debugging through the DOMNode->remove()->release() process, I have >seen these strings being pushed into a "recycled" container. Why does >the code bother to do that, if the "pages" as you call them are never >actually cleaned? Because they are recycled the next time a new node is created. >2. I have noticed, too, that on some occasions xerces DOES reuse >released nodes (I got the same pointers again and again when creating >elements, attributes etc.) What is the rule here? I suspect that >repeated doc->importNode() is the call that bloats my DOMDocument. But I >have no proof... Calling importNode is the way you can copy DOMNodes from a DOMDocument to another (all the DOMNodes in a DOM tree must come from the same memory pool owned by the DOMDocument at the root); it will end up creating copies of the source nodes, recycling released nodes if they are available. >3. If DOMDocument does not keep track of the cleaned "pages" (Are they >the "buckets" in the code?) can I add a cleanup function to >DOMDocumentImpl.cpp/hpp to EXPLICITELY scan and release such "pages" ? >Can you hint on the implications? I don't need to do it very often so >such function can be (for my purpose) inefficient, but I absolutely need >to do this at times. The implication is that you should track all the allocations/deallocations made by DOMDocument from each page, and that would slow down the entire program, not just the cleanup phase. The alternative approach (scanning the entire tree to check where the nodes are pointing to is both inefficient and prone to errors, as some pointers could be held by arrays or maps, many levels down). So, you are left with two choices: 1) redesign your code to avoid allocating/deallocating many nodes (why do you need to call importNode so many times? would a brand new DOMDocument that is deleted at the end of the processing do the same work?) 2) change the code of the DOMDocument memory manager to track all the memory pieces (e.g. on Windows, you could use a private heap with HeapCreate/HeapAlloc/HeapFree/HeapDestroy) Hope this helps, Alberto >Thanks a lot - >Motti Shneor > >-----Original Message----- >From: Alberto Massari [mailto:[EMAIL PROTECTED] >Sent: Tuesday, December 12, 2006 10:15 AM >To: [email protected] >Subject: Re: DOMDocument memory bloating problem > >Hi Motti, >unfortunately there is no such control on the memory allocated by >DOMDocument: all of the nodes and strings used in the DOM tree come >from a memory pool allocated by the DOMDocument, and they can be >freed only by deleting the entire page (and DOMDocument doesn't keep >statistics to check whether an entire page contains only released >nodes). So the only way to release the memory is by releasing the >entire DOMDocument. > >Sorry if this is not the answer you would have liked, >Alberto > >At 09.36 12/12/2006 +0200, Motti Shneor wrote: > >Hello everyone. Happy to join the list. > > > >I use a system that reuses the same xerces::DomDocument for long >period, > >adding and releasing DomNodes (elements, attributes etc.) continuously. > > > >Although I DomNode->remove()->release() every unneeded node, the memory > >taken up by DomDocument seems to ever increase, to the point the >program > >becomes unusable. > > > >In the docs, it is recommended that I release unused nodes, but it only > >is assured that they are actually released when the document is > >released. This is not good enough in my situation. > > > >I see that xerces memory manager's "deallocate()" is never called on my > >nodes until I explicitly DomDocument *myDoc->release(); > > > > > >I am seeking a way to instruct a DomDocument to actually clear and free > >its RELEASED nodes. Something like a partial DomDocoment->release() >that > >will only clean up its heap from released stuff. > > > >Is it possible? Is there a simple way to do this? What are the prices? > > > >Any ideas? > > > > > >Motti Shneor > >Senior Software Engineer > >Orbograph Ltd. > >[EMAIL PROTECTED] > >http://www.orbograph.com > > > >--------------------------------------------------------------------- > >To unsubscribe, e-mail: [EMAIL PROTECTED] > >For additional commands, e-mail: [EMAIL PROTECTED] > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
