Hi, Jason
I have a few other ideas that I am testing. Can you send me the (zipped)
xml, and a bit of test code so that I can check if my ideas work

Mike Skells

> -----Original Message-----
> From: Jason Horman [mailto:[EMAIL PROTECTED] 
> Sent: Thursday 20 February 2003 00:04
> To: 'James Strachan'; [EMAIL PROTECTED]
> Subject: RE: [dom4j-dev] huge dom
> 
> 
> Thanks, that trimmed off about 150 mb's from memory. Still 
> seems large to me, but I suppose the tree is quite large.
> 
> I cannot use the "row by row" technique since I need to have 
> a dom available for the massive number of xpath statements 
> and sorts that I need to do across the entire document. The 
> document is essentially a database dump. I may look into the 
> new BDB XML db instead of in-memory in the future.
> 
> -jason
> 
> -----Original Message-----
> From: James Strachan [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, February 19, 2003 12:26 AM
> To: Jason Horman; [EMAIL PROTECTED]
> Subject: Re: [dom4j-dev] huge dom
> 
> 
> 
> First off there's an FAQ entry
> 
> http://dom4j.org/faq.html
> 
> on How does dom4j handle very large XML documents?
> 
> http://dom4j.org/faq.html#How%20does%20dom4j%20handle%20very%2
> 0large%20XML%2
> 0documents?
> 
> which essentially means you can process the document in a 
> 'row by row' kinda way rather than waiting to load the whole 
> thing in one go.
> 
> 
> Other flags that might help reduce the overall memory 
> footprint are these, which avoids storing unnecessary String 
> or whitespace objects...
> 
> SAXReader reader = new SAXReader(); reader.setMergeAdjacentText(true);
> reader.setStringInternenabled(true);
> reader.setStripWhitespaceText(true);
> 
> James
> -------
> http://radio.weblogs.com/0112098/
> ----- Original Message -----
> From: Jason Horman
> To: '[EMAIL PROTECTED]'
> Sent: Friday, February 14, 2003 1:11 AM
> Subject: [dom4j-dev] huge dom
> 
> 
> I am using dom4j-1.4-dev-8.jar, the version that came with my 
> last maven build of jelly.
> 
> My xml document:
> 
> 159 mbs
> 2,438,791 lines/tags -> 1 tag per line, all attributes
> ~6 attributes per tag
> 4 out of 6 attributes are numeric values, so they are not 
> huge strings. Attributes 5 and 6 could probably be interned 
> as well, but this would require additional api support.
> 
> This document expands to 1100mb's in memory. Could this be 
> right? Seems high to me. I assume all element names and 
> attribute names are interned. I tried to force interning by 
> doing this:
> 
>         SAXReader reader = new SAXReader();
>         
> reader.setFeature("http://xml.org/sax/features/string-interning";,
> true);
> 
> Which I think is the default anyway. I am using 
> xerces-2.0.2.jar for SAXReader via the system property.
> 
> Are things being interned? Are there any other tricks to 
> reducing memory consumption?
> 
> -jason horman
>  [EMAIL PROTECTED]
> This email message and any attachments are for the sole use 
> of the intended
> recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, use, disclosure or 
> distribution is prohibited. If you are not the intended 
> recipient or his/her representative, please contact the 
> sender by reply email and destroy all copies of the original message.
> 
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts 
> http://uk.my.yahoo.com This email message and > any attachments 
> are for the sole use of the intended
> recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, use, disclosure or 
> distribution is prohibited. If you are not the intended 
> recipient or his/her representative, please contact the 
> sender by reply email and destroy all copies of the original message.
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: SlickEdit Inc. Develop an 
> edge. The most comprehensive and flexible code editor you can 
> use. Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 
> 30-Day Trial. www.slickedit.com/sourceforge 
> _______________________________________________
> dom4j-dev mailing list
> [EMAIL PROTECTED] 
> https://lists.sourceforge.net/lists/listinfo/d> om4j-dev
> 


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
dom4j-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

Reply via email to