Re: [dom4j-dev] Java document model performance

Thomas Nichols Mon, 08 Oct 2001 02:29:38 -0700

At 08:29 05/10/2001 +0100, James Strachan wrote:
>Hi Dennis
>
> > I'll update the results on my web site, at least. I'll forward separately
>a
> > couple of emails from the JDOM list discussing how XMLS could be made the
>normal
> > serialization method while still allowing people to use default
>serialization if
> > they want (useful if they're associating extra information with the
>document
> > components and aren't too concerned about performance).
>
>Cool. Maybe we could use DocumentFactory to store the 'serializer' of a
>document or document fragment which would use XMLS if its on the CLASSPATH
>otherwise just use the default mechanism. Then people deriving their own
>DocumentFactory classes could explicitly set what kind of serializer should
>be used.
>
>
> > I think the whitespace option I mention at the end of the dW article is
> > something to consider, too. The idea here is to allow applications to
>discard
> > isolated whitespace (whitespace that's not embedded in non-whitespace) as
>the
> > document is parsed. For middleware applications this type of whitespace is
>only
> > there to make the documents readable and is ignored by the application.
> >
> > Electric XML has been discarding it all along, apparently without anyone
> > noticing before. That's not a good approach, since it breaks handling of
>some
> > types of documents (and violates the XML recommendation). Making this an
>option
> > under the application's control seems a good alternative. It'd be even
>better if
> > it could be implemented in the parser rather than the document model.
>
>The cheats ;-)
>
>Yes I like this idea too and it should be fairly easy to add as a
>configurable option to dom4j's SAXReader, though hopefully this could be a
>SAX parser property so everyone can benefit.
>
>The problem is though, without access to a DTD its a bit hard to know if you
>can trim whitespace. Though I guess often we know its OK. e.g.
>
><body>
>     <p>hello<i>there!</i></p>
></body>
>
>The text node before and after the <p> element could be trimmed. So only
>remove text nodes which are just whitespace, seems a reasonable configurable
>option.



Ummm... not sure you how can tell this can be trimmed? In this case 
(assuming this is the XHTML) the DTD defines it to be ok, but I had thought 
that the XML spec made whitespace significant - so it can't be trimmed in 
the general case. Please do correct me if I've misunderstood.

It is of course possible to apply an XSL filter to the input stream to 
remove whitespace, though doing this during document reading would be great.

Best Regards,
Thomas.



_______________________________________________
dom4j-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

Re: [dom4j-dev] Java document model performance

Reply via email to