Hi, Jason I have a few other ideas that I am testing. Can you send me the (zipped) xml, and a bit of test code so that I can check if my ideas work
Mike Skells > -----Original Message----- > From: Jason Horman [mailto:[EMAIL PROTECTED] > Sent: Thursday 20 February 2003 00:04 > To: 'James Strachan'; [EMAIL PROTECTED] > Subject: RE: [dom4j-dev] huge dom > > > Thanks, that trimmed off about 150 mb's from memory. Still > seems large to me, but I suppose the tree is quite large. > > I cannot use the "row by row" technique since I need to have > a dom available for the massive number of xpath statements > and sorts that I need to do across the entire document. The > document is essentially a database dump. I may look into the > new BDB XML db instead of in-memory in the future. > > -jason > > -----Original Message----- > From: James Strachan [mailto:[EMAIL PROTECTED] > Sent: Wednesday, February 19, 2003 12:26 AM > To: Jason Horman; [EMAIL PROTECTED] > Subject: Re: [dom4j-dev] huge dom > > > > First off there's an FAQ entry > > http://dom4j.org/faq.html > > on How does dom4j handle very large XML documents? > > http://dom4j.org/faq.html#How%20does%20dom4j%20handle%20very%2 > 0large%20XML%2 > 0documents? > > which essentially means you can process the document in a > 'row by row' kinda way rather than waiting to load the whole > thing in one go. > > > Other flags that might help reduce the overall memory > footprint are these, which avoids storing unnecessary String > or whitespace objects... > > SAXReader reader = new SAXReader(); reader.setMergeAdjacentText(true); > reader.setStringInternenabled(true); > reader.setStripWhitespaceText(true); > > James > ------- > http://radio.weblogs.com/0112098/ > ----- Original Message ----- > From: Jason Horman > To: '[EMAIL PROTECTED]' > Sent: Friday, February 14, 2003 1:11 AM > Subject: [dom4j-dev] huge dom > > > I am using dom4j-1.4-dev-8.jar, the version that came with my > last maven build of jelly. > > My xml document: > > 159 mbs > 2,438,791 lines/tags -> 1 tag per line, all attributes > ~6 attributes per tag > 4 out of 6 attributes are numeric values, so they are not > huge strings. Attributes 5 and 6 could probably be interned > as well, but this would require additional api support. > > This document expands to 1100mb's in memory. Could this be > right? Seems high to me. I assume all element names and > attribute names are interned. I tried to force interning by > doing this: > > SAXReader reader = new SAXReader(); > > reader.setFeature("http://xml.org/sax/features/string-interning", > true); > > Which I think is the default anyway. I am using > xerces-2.0.2.jar for SAXReader via the system property. > > Are things being interned? Are there any other tricks to > reducing memory consumption? > > -jason horman > [EMAIL PROTECTED] > This email message and any attachments are for the sole use > of the intended > recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or > distribution is prohibited. If you are not the intended > recipient or his/her representative, please contact the > sender by reply email and destroy all copies of the original message. > > __________________________________________________ > Do You Yahoo!? > Everything you'll ever need on one web page > from News and Sport to Email and Music Charts > http://uk.my.yahoo.com This email message and > any attachments > are for the sole use of the intended > recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or > distribution is prohibited. If you are not the intended > recipient or his/her representative, please contact the > sender by reply email and destroy all copies of the original message. > > > ------------------------------------------------------- > This SF.net email is sponsored by: SlickEdit Inc. Develop an > edge. The most comprehensive and flexible code editor you can > use. Code faster. C/C++, C#, Java, HTML, XML, many more. FREE > 30-Day Trial. www.slickedit.com/sourceforge > _______________________________________________ > dom4j-dev mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/d> om4j-dev > ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ dom4j-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dom4j-dev