----- Original Message -----
From: "Thomas Nichols" <[EMAIL PROTECTED]>
> >Yes I like this idea too and it should be fairly easy to add as a
> >configurable option to dom4j's SAXReader, though hopefully this could be
a
> >SAX parser property so everyone can benefit.
> >
> >The problem is though, without access to a DTD its a bit hard to know if
you
> >can trim whitespace. Though I guess often we know its OK. e.g.
> >
> ><body>
> >     <p>hello<i>there!</i></p>
> ></body>
> >
> >The text node before and after the <p> element could be trimmed. So only
> >remove text nodes which are just whitespace, seems a reasonable
configurable
> >option.
>
>
> Ummm... not sure you how can tell this can be trimmed?

Without access to a DTD you can't know what whitespace is significant thanks
to 'mixed content' where tags are embedded inside text.

> In this case
> (assuming this is the XHTML) the DTD defines it to be ok, but I had
thought
> that the XML spec made whitespace significant - so it can't be trimmed in
> the general case. Please do correct me if I've misunderstood.

You're understanding is correct. I was thinking of cases where a developer
knew up front what kinds of documents they were parsing and so they
themselves turned on whitespace-trimming mode, fully aware of the
consequences. Any whitespace trimming technique should be used with extreme
care. Though for data-centric applications, trimming whitespace could be
really useful.

e.g.

<customer>
    <name>James</name>
    <location>UK</location>
</customer>

If trimming of whitespace-only text nodes was enabled, the above would not
have 3 extra Text nodes added. This could only be done safely if the DTD was
like this

<!ELEMENT customer (name, location) >

though it could be enabled by hand if the developer understood what they
were doing.


> It is of course possible to apply an XSL filter to the input stream to
> remove whitespace, though doing this during document reading would be
great.

It would be easy to turn on as a configurable option if you know that
whitespace isn't important. Normally DTDs are used to make that call.

James


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


_______________________________________________
dom4j-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

Reply via email to