Hi,
I would use the Flyweight if it was not broken - see the thread on
equals and hashCode, so I have subclassed from that.

The values are part of the node, and the intering process looks at nodes
which are identical. I have just about finished the code, 
Docuent factories which co-ordinates the interning of the the leaf nodes
The element classes are written, and are constructed by the Element
handler, which coordinates the interning of the attribute lists and the
content list, and the elemnt itself.
There are a number of support classes for the custom lists (to reduce
size), and a basic interner

I have a couple of bugs to track down this morning, and I have finished
seperating the code from my commercial dependencies, so I should ship
you a demo jar this pm. 
I will run some tests on that 800Mb XML file you refered to so that I
can get some stats
I havent checked that the tree is any good for use yet! But I  geuss
that you could try it in with you app to see if anything brakes

> -----Original Message-----
> From: Jason Horman [mailto:[EMAIL PROTECTED] 
> Sent: Thursday 27 February 2003 23:33
> To: Mike Skells
> Subject: RE: [dom4j-dev] huge dom
> 
> 
> Excellent, that would be great. How do you plan on using 
> flyweight/factories. The nodes I have aren't exact 
> duplicates. The actual attribute names and element names or 
> obviously duplicated but the values of the attributes will 
> differ. I assumed though that string interning would fix the 
> issue of duplicate names.
> 
> -jason
> 
> -----Original Message-----
> From: Mike Skells [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, February 26, 2003 4:45 AM
> To: Jason Horman
> Cc: [EMAIL PROTECTED]
> Subject: RE: [dom4j-dev] huge dom
> 
> 
> Hi jason,
> I am looking at using content handlers and/or document 
> factories modifications to allow for the re-use of some of 
> the nodes. I have run some tests on some large doms that I 
> have, and have spotted a reduction of 50% - 95% on the size 
> of the dom. The restrictions are that the dom is read only, 
> which is not a problem for you I believe, and that the 
> flyweight pattern is used.
> 
> Once I have a version that I am happy with I cn send you some 
> code to try on your XML file. I will be looking to contribute 
> this code once it has stablised, and I have removed the few 
> minor propritory classes
> 
> Mike
> > -----Original Message-----
> > From: Jason Horman [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday 25 February 2003 23:40
> > To: Mike Skells; Jason Horman
> > Cc: [EMAIL PROTECTED]
> > Subject: RE: [dom4j-dev] huge dom
> > 
> > 
> > I can't really send my DOM since it is proprietary company
> > information. There are plenty of HUGE xml docs on the web 
> > though, such as:
> > 
> > 
> http://www.cs.washington.edu/research/xmldatasets/www/repository.html/
> > http://www.cs.washington.edu/research/xmldatasets/www/data/pir
> > /psd7003.xml.g
> > z
> > 
> > 21,305,818 elements
> > 103 MB's
> > 
> > I basically was just doing this:
> > 
> >                         SAXReader reader = new SAXReader();
> >                         reader.setStringInternEnabled(true);
> >                         reader.setMergeAdjacentText(true);
> >                         reader.setStripWhitespaceText(true);
> >                         
> >                         Document oldArtistDoc =
> > reader.read(inputStream);
> > 
> > 
> > Thanks,
> > Jason Horman
> > [EMAIL PROTECTED]
> > 
> > -----Original Message-----
> > From: Mike Skells [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, February 25, 2003 10:00 AM
> > To: Jason Horman
> > Cc: [EMAIL PROTECTED]
> > Subject: RE: [dom4j-dev] huge dom
> > 
> > 
> > Hi, Jason
> > I have a few other ideas that I am testing. Can you send me
> > the (zipped) xml, and a bit of test code so that I can check 
> > if my ideas work
> > 
> > Mike Skells
> > 
> > > -----Original Message-----
> > > From: Jason Horman [mailto:[EMAIL PROTECTED]
> > > Sent: Thursday 20 February 2003 00:04
> > > To: 'James Strachan'; [EMAIL PROTECTED]
> > > Subject: RE: [dom4j-dev] huge dom
> > > 
> > > 
> > > Thanks, that trimmed off about 150 mb's from memory. Still seems 
> > > large to me, but I suppose the tree is quite large.
> > > 
> > > I cannot use the "row by row" technique since I need to 
> have a dom 
> > > available for the massive number of xpath statements and 
> sorts that 
> > > I need to do across the entire document. The document is 
> essentially 
> > > a database dump. I may look into the new BDB XML db instead of 
> > > in-memory in the future.
> > > 
> > > -jason
> > > 
> > > -----Original Message-----
> > > From: James Strachan [mailto:[EMAIL PROTECTED]
> > > Sent: Wednesday, February 19, 2003 12:26 AM
> > > To: Jason Horman; [EMAIL PROTECTED]
> > > Subject: Re: [dom4j-dev] huge dom
> > > 
> > > 
> > > 
> > > First off there's an FAQ entry
> > > 
> > > http://dom4j.org/faq.html
> > > 
> > > on How does dom4j handle very large XML documents?
> > > 
> > > http://dom4j.org/faq.html#How%20does%20dom4j%20handle%20very%2
> > > 0large%20XML%2
> > > 0documents?
> > > 
> > > which essentially means you can process the document in a 'row by 
> > > row' kinda way rather than waiting to load the whole thing in one 
> > > go.
> > > 
> > > 
> > > Other flags that might help reduce the overall memory 
> footprint are 
> > > these, which avoids storing unnecessary String or whitespace 
> > > objects...
> > > 
> > > SAXReader reader = new SAXReader();
> > reader.setMergeAdjacentText(true);
> > > reader.setStringInternenabled(true);
> > > reader.setStripWhitespaceText(true);
> > > 
> > > James
> > > -------
> > > http://radio.weblogs.com/0112098/
> > > ----- Original Message -----
> > > From: Jason Horman
> > > To: '[EMAIL PROTECTED]'
> > > Sent: Friday, February 14, 2003 1:11 AM
> > > Subject: [dom4j-dev] huge dom
> > > 
> > > 
> > > I am using dom4j-1.4-dev-8.jar, the version that came 
> with my last 
> > > maven build of jelly.
> > > 
> > > My xml document:
> > > 
> > > 159 mbs
> > > 2,438,791 lines/tags -> 1 tag per line, all attributes
> > > ~6 attributes per tag
> > > 4 out of 6 attributes are numeric values, so they are not huge 
> > > strings. Attributes 5 and 6 could probably be interned as 
> well, but 
> > > this would require additional api support.
> > > 
> > > This document expands to 1100mb's in memory. Could this be right? 
> > > Seems high to me. I assume all element names and 
> attribute names are 
> > > interned. I tried to force interning by doing this:
> > > 
> > >         SAXReader reader = new SAXReader();
> > >         
> > > reader.setFeature("http://xml.org/sax/features/string-interning";,
> > > true);
> > > 
> > > Which I think is the default anyway. I am using 
> xerces-2.0.2.jar for 
> > > SAXReader via the system property.
> > > 
> > > Are things being interned? Are there any other tricks to reducing 
> > > memory consumption?
> > > 
> > > -jason horman
> > >  [EMAIL PROTECTED]
> > > This email message and any attachments are for the sole 
> use of the 
> > > intended
> > > recipient(s) and may contain confidential and privileged
> > > information. Any unauthorized review, use, disclosure or 
> > > distribution is prohibited. If you are not the intended 
> > > recipient or his/her representative, please contact the 
> > > sender by reply email and destroy all copies of the 
> > original message.
> > > 
> > > __________________________________________________
> > > Do You Yahoo!?
> > > Everything you'll ever need on one web page
> > > from News and Sport to Email and Music Charts 
http://uk.my.yahoo.com 
> > This email message and > any attachments are for the sole use of the

> > intended
> > recipient(s) and may contain confidential and privileged
> > information. Any unauthorized review, use, disclosure or 
> > distribution is prohibited. If you are not the intended 
> > recipient or his/her representative, please contact the 
> > sender by reply email and destroy all copies of the 
> original message.
> > 
> > 
> > -------------------------------------------------------
> > This SF.net email is sponsored by: SlickEdit Inc. Develop an edge. 
> > The most comprehensive and flexible code editor you can use. Code 
> > faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial. 
> > www.slickedit.com/sourceforge 
> > _______________________________________________
> > dom4j-dev mailing list
> > [EMAIL PROTECTED]
> > https://lists.sourceforge.net/lists/listinfo/d> om4j-dev
> > 
> This email message and any attachments are for the sole use
> of the intended
> recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, use, disclosure or 
> distribution is prohibited. If you are not the intended 
> recipient or his/her representative, please contact the 
> sender by reply email and destroy all copies of the original message.
> 
This email message and any attachments are for the sole use of the
intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient or his/her representative, please
contact the sender by reply email and destroy all copies of the original
message.


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
dom4j-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

Reply via email to