I can't really send my DOM since it is proprietary company information. There are plenty of HUGE xml docs on the web though, such as:
http://www.cs.washington.edu/research/xmldatasets/www/repository.html/ http://www.cs.washington.edu/research/xmldatasets/www/data/pir/psd7003.xml.g z 21,305,818 elements 103 MB's I basically was just doing this: SAXReader reader = new SAXReader(); reader.setStringInternEnabled(true); reader.setMergeAdjacentText(true); reader.setStripWhitespaceText(true); Document oldArtistDoc = reader.read(inputStream); Thanks, Jason Horman [EMAIL PROTECTED] -----Original Message----- From: Mike Skells [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 25, 2003 10:00 AM To: Jason Horman Cc: [EMAIL PROTECTED] Subject: RE: [dom4j-dev] huge dom Hi, Jason I have a few other ideas that I am testing. Can you send me the (zipped) xml, and a bit of test code so that I can check if my ideas work Mike Skells > -----Original Message----- > From: Jason Horman [mailto:[EMAIL PROTECTED] > Sent: Thursday 20 February 2003 00:04 > To: 'James Strachan'; [EMAIL PROTECTED] > Subject: RE: [dom4j-dev] huge dom > > > Thanks, that trimmed off about 150 mb's from memory. Still > seems large to me, but I suppose the tree is quite large. > > I cannot use the "row by row" technique since I need to have > a dom available for the massive number of xpath statements > and sorts that I need to do across the entire document. The > document is essentially a database dump. I may look into the > new BDB XML db instead of in-memory in the future. > > -jason > > -----Original Message----- > From: James Strachan [mailto:[EMAIL PROTECTED] > Sent: Wednesday, February 19, 2003 12:26 AM > To: Jason Horman; [EMAIL PROTECTED] > Subject: Re: [dom4j-dev] huge dom > > > > First off there's an FAQ entry > > http://dom4j.org/faq.html > > on How does dom4j handle very large XML documents? > > http://dom4j.org/faq.html#How%20does%20dom4j%20handle%20very%2 > 0large%20XML%2 > 0documents? > > which essentially means you can process the document in a > 'row by row' kinda way rather than waiting to load the whole > thing in one go. > > > Other flags that might help reduce the overall memory > footprint are these, which avoids storing unnecessary String > or whitespace objects... > > SAXReader reader = new SAXReader(); reader.setMergeAdjacentText(true); > reader.setStringInternenabled(true); > reader.setStripWhitespaceText(true); > > James > ------- > http://radio.weblogs.com/0112098/ > ----- Original Message ----- > From: Jason Horman > To: '[EMAIL PROTECTED]' > Sent: Friday, February 14, 2003 1:11 AM > Subject: [dom4j-dev] huge dom > > > I am using dom4j-1.4-dev-8.jar, the version that came with my > last maven build of jelly. > > My xml document: > > 159 mbs > 2,438,791 lines/tags -> 1 tag per line, all attributes > ~6 attributes per tag > 4 out of 6 attributes are numeric values, so they are not > huge strings. Attributes 5 and 6 could probably be interned > as well, but this would require additional api support. > > This document expands to 1100mb's in memory. Could this be > right? Seems high to me. I assume all element names and > attribute names are interned. I tried to force interning by > doing this: > > SAXReader reader = new SAXReader(); > > reader.setFeature("http://xml.org/sax/features/string-interning", > true); > > Which I think is the default anyway. I am using > xerces-2.0.2.jar for SAXReader via the system property. > > Are things being interned? Are there any other tricks to > reducing memory consumption? > > -jason horman > [EMAIL PROTECTED] > This email message and any attachments are for the sole use > of the intended > recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or > distribution is prohibited. If you are not the intended > recipient or his/her representative, please contact the > sender by reply email and destroy all copies of the original message. > > __________________________________________________ > Do You Yahoo!? > Everything you'll ever need on one web page > from News and Sport to Email and Music Charts > http://uk.my.yahoo.com This email message and > any attachments > are for the sole use of the intended > recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or > distribution is prohibited. If you are not the intended > recipient or his/her representative, please contact the > sender by reply email and destroy all copies of the original message. > > > ------------------------------------------------------- > This SF.net email is sponsored by: SlickEdit Inc. Develop an > edge. The most comprehensive and flexible code editor you can > use. Code faster. C/C++, C#, Java, HTML, XML, many more. FREE > 30-Day Trial. www.slickedit.com/sourceforge > _______________________________________________ > dom4j-dev mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/d> om4j-dev > This email message and any attachments are for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient or his/her representative, please contact the sender by reply email and destroy all copies of the original message. ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ dom4j-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dom4j-dev