Hi Jason
Just a note to say that I haveny forgotten about this issue.
The work is just going a little slower that I would hav hoped, as I have
to do some paying work first

Hopefully I should have finished a test build for you this by the end of
next week

Mike

> -----Original Message-----
> From: Mike Skells 
> Sent: Friday 28 February 2003 08:37
> To: Jason Horman
> Cc: [EMAIL PROTECTED]
> Subject: RE: [dom4j-dev] huge dom
> 
> 
> Hi,
> I would use the Flyweight if it was not broken - see the 
> thread on equals and hashCode, so I have subclassed from that.
> 
> The values are part of the node, and the intering process 
> looks at nodes which are identical. I have just about 
> finished the code, 
> Docuent factories which co-ordinates the interning of the the 
> leaf nodes The element classes are written, and are 
> constructed by the Element handler, which coordinates the 
> interning of the attribute lists and the content list, and 
> the elemnt itself. There are a number of support classes for 
> the custom lists (to reduce size), and a basic interner
> 
> I have a couple of bugs to track down this morning, and I 
> have finished seperating the code from my commercial 
> dependencies, so I should ship you a demo jar this pm. 
> I will run some tests on that 800Mb XML file you refered to 
> so that I can get some stats I havent checked that the tree 
> is any good for use yet! But I  geuss that you could try it 
> in with you app to see if anything brakes
> 
> > -----Original Message-----
> > From: Jason Horman [mailto:[EMAIL PROTECTED]
> > Sent: Thursday 27 February 2003 23:33
> > To: Mike Skells
> > Subject: RE: [dom4j-dev] huge dom
> > 
> > 
> > Excellent, that would be great. How do you plan on using
> > flyweight/factories. The nodes I have aren't exact 
> > duplicates. The actual attribute names and element names or 
> > obviously duplicated but the values of the attributes will 
> > differ. I assumed though that string interning would fix the 
> > issue of duplicate names.
> > 
> > -jason
> > 
> > -----Original Message-----
> > From: Mike Skells [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, February 26, 2003 4:45 AM
> > To: Jason Horman
> > Cc: [EMAIL PROTECTED]
> > Subject: RE: [dom4j-dev] huge dom
> > 
> > 
> > Hi jason,
> > I am looking at using content handlers and/or document
> > factories modifications to allow for the re-use of some of 
> > the nodes. I have run some tests on some large doms that I 
> > have, and have spotted a reduction of 50% - 95% on the size 
> > of the dom. The restrictions are that the dom is read only, 
> > which is not a problem for you I believe, and that the 
> > flyweight pattern is used.
> > 
> > Once I have a version that I am happy with I cn send you some
> > code to try on your XML file. I will be looking to contribute 
> > this code once it has stablised, and I have removed the few 
> > minor propritory classes
> > 
> > Mike
> > > -----Original Message-----
> > > From: Jason Horman [mailto:[EMAIL PROTECTED]
> > > Sent: Tuesday 25 February 2003 23:40
> > > To: Mike Skells; Jason Horman
> > > Cc: [EMAIL PROTECTED]
> > > Subject: RE: [dom4j-dev] huge dom
> > > 
> > > 
> > > I can't really send my DOM since it is proprietary company 
> > > information. There are plenty of HUGE xml docs on the web though, 
> > > such as:
> > > 
> > > 
> > 
> http://www.cs.washington.edu/research/xmldatasets/www/repository.html/
> > > http://www.cs.washington.edu/research/xmldatasets/www/data/pir
> > > /psd7003.xml.g
> > > z
> > > 
> > > 21,305,818 elements
> > > 103 MB's
> > > 
> > > I basically was just doing this:
> > > 
> > >                         SAXReader reader = new SAXReader();
> > >                         reader.setStringInternEnabled(true);
> > >                         reader.setMergeAdjacentText(true);
> > >                         reader.setStripWhitespaceText(true);
> > >                         
> > >                         Document oldArtistDoc = 
> > > reader.read(inputStream);
> > > 
> > > 
> > > Thanks,
> > > Jason Horman
> > > [EMAIL PROTECTED]
> > > 
> > > -----Original Message-----
> > > From: Mike Skells [mailto:[EMAIL PROTECTED]
> > > Sent: Tuesday, February 25, 2003 10:00 AM
> > > To: Jason Horman
> > > Cc: [EMAIL PROTECTED]
> > > Subject: RE: [dom4j-dev] huge dom
> > > 
> > > 
> > > Hi, Jason
> > > I have a few other ideas that I am testing. Can you send me the 
> > > (zipped) xml, and a bit of test code so that I can check 
> if my ideas 
> > > work
> > > 
> > > Mike Skells
> > > 
> > > > -----Original Message-----
> > > > From: Jason Horman [mailto:[EMAIL PROTECTED]
> > > > Sent: Thursday 20 February 2003 00:04
> > > > To: 'James Strachan'; [EMAIL PROTECTED]
> > > > Subject: RE: [dom4j-dev] huge dom
> > > > 
> > > > 
> > > > Thanks, that trimmed off about 150 mb's from memory. Still seems
> > > > large to me, but I suppose the tree is quite large.
> > > > 
> > > > I cannot use the "row by row" technique since I need to
> > have a dom
> > > > available for the massive number of xpath statements and
> > sorts that
> > > > I need to do across the entire document. The document is
> > essentially
> > > > a database dump. I may look into the new BDB XML db instead of
> > > > in-memory in the future.
> > > > 
> > > > -jason
> > > > 
> > > > -----Original Message-----
> > > > From: James Strachan [mailto:[EMAIL PROTECTED]
> > > > Sent: Wednesday, February 19, 2003 12:26 AM
> > > > To: Jason Horman; [EMAIL PROTECTED]
> > > > Subject: Re: [dom4j-dev] huge dom
> > > > 
> > > > 
> > > > 
> > > > First off there's an FAQ entry
> > > > 
> > > > http://dom4j.org/faq.html
> > > > 
> > > > on How does dom4j handle very large XML documents?
> > > > 
> > > > http://dom4j.org/faq.html#How%20does%20dom4j%20handle%20very%2
> > > > 0large%20XML%2
> > > > 0documents?
> > > > 
> > > > which essentially means you can process the document in 
> a 'row by
> > > > row' kinda way rather than waiting to load the whole 
> thing in one 
> > > > go.
> > > > 
> > > > 
> > > > Other flags that might help reduce the overall memory
> > footprint are
> > > > these, which avoids storing unnecessary String or whitespace
> > > > objects...
> > > > 
> > > > SAXReader reader = new SAXReader();
> > > reader.setMergeAdjacentText(true);
> > > > reader.setStringInternenabled(true);
> > > > reader.setStripWhitespaceText(true);
> > > > 
> > > > James
> > > > -------
> > > > http://radio.weblogs.com/0112098/
> > > > ----- Original Message -----
> > > > From: Jason Horman
> > > > To: '[EMAIL PROTECTED]'
> > > > Sent: Friday, February 14, 2003 1:11 AM
> > > > Subject: [dom4j-dev] huge dom
> > > > 
> > > > 
> > > > I am using dom4j-1.4-dev-8.jar, the version that came
> > with my last
> > > > maven build of jelly.
> > > > 
> > > > My xml document:
> > > > 
> > > > 159 mbs
> > > > 2,438,791 lines/tags -> 1 tag per line, all attributes
> > > > ~6 attributes per tag
> > > > 4 out of 6 attributes are numeric values, so they are not huge
> > > > strings. Attributes 5 and 6 could probably be interned as 
> > well, but
> > > > this would require additional api support.
> > > > 
> > > > This document expands to 1100mb's in memory. Could this 
> be right?
> > > > Seems high to me. I assume all element names and 
> > attribute names are
> > > > interned. I tried to force interning by doing this:
> > > > 
> > > >         SAXReader reader = new SAXReader();
> > > >         
> > > > 
> reader.setFeature("http://xml.org/sax/features/string-interning";,
> > > > true);
> > > > 
> > > > Which I think is the default anyway. I am using
> > xerces-2.0.2.jar for
> > > > SAXReader via the system property.
> > > > 
> > > > Are things being interned? Are there any other tricks 
> to reducing
> > > > memory consumption?
> > > > 
> > > > -jason horman
> > > >  [EMAIL PROTECTED]
> > > > This email message and any attachments are for the sole
> > use of the
> > > > intended
> > > > recipient(s) and may contain confidential and privileged 
> > > > information. Any unauthorized review, use, disclosure or 
> > > > distribution is prohibited. If you are not the intended 
> recipient 
> > > > or his/her representative, please contact the sender by reply 
> > > > email and destroy all copies of the
> > > original message.
> > > > 
> > > > __________________________________________________
> > > > Do You Yahoo!?
> > > > Everything you'll ever need on one web page
> > > > from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com 
> > > This email message and > any attachments are for the sole 
> use of the
> 
> > > intended
> > > recipient(s) and may contain confidential and privileged 
> > > information. Any unauthorized review, use, disclosure or 
> > > distribution is prohibited. If you are not the intended 
> recipient or 
> > > his/her representative, please contact the sender by 
> reply email and 
> > > destroy all copies of the
> > original message.
> > > 
> > > 
> > > -------------------------------------------------------
> > > This SF.net email is sponsored by: SlickEdit Inc. Develop an edge.
> > > The most comprehensive and flexible code editor you can use. Code 
> > > faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial. 
> > > www.slickedit.com/sourceforge 
> > > _______________________________________________
> > > dom4j-dev mailing list
> > > [EMAIL PROTECTED]
> > > https://lists.sourceforge.net/lists/listinfo/d> om4j-dev
> > > 
> > This email message and any attachments are for the sole use of the 
> > intended
> > recipient(s) and may contain confidential and privileged
> > information. Any unauthorized review, use, disclosure or 
> > distribution is prohibited. If you are not the intended 
> > recipient or his/her representative, please contact the 
> > sender by reply email and destroy all copies of the 
> original message.
> > 
> This email message and any attachments are for the sole use 
> of the intended
> recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, use, disclosure or 
> distribution is prohibited. If you are not the intended 
> recipient or his/her representative, please contact the 
> sender by reply email and destroy all copies of the original message.
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf 
> _______________________________________________
> dom4j-dev mailing list
> [EMAIL PROTECTED] 
> https://lists.sourceforge.net/lists/listinfo/d> om4j-dev
> 


-------------------------------------------------------
This SF.net email is sponsored by: Tablet PC.
Does your code think in ink? You could win a Tablet PC.
Get a free Tablet PC hat just for playing. What are you waiting for?
http://ads.sourceforge.net/cgi-bin/redirect.pl?micr5043en
_______________________________________________
dom4j-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

Reply via email to