----- Original Message ----- From: "Thomas Nichols" <[EMAIL PROTECTED]> > >Yes I like this idea too and it should be fairly easy to add as a > >configurable option to dom4j's SAXReader, though hopefully this could be a > >SAX parser property so everyone can benefit. > > > >The problem is though, without access to a DTD its a bit hard to know if you > >can trim whitespace. Though I guess often we know its OK. e.g. > > > ><body> > > <p>hello<i>there!</i></p> > ></body> > > > >The text node before and after the <p> element could be trimmed. So only > >remove text nodes which are just whitespace, seems a reasonable configurable > >option. > > > Ummm... not sure you how can tell this can be trimmed?
Without access to a DTD you can't know what whitespace is significant thanks to 'mixed content' where tags are embedded inside text. > In this case > (assuming this is the XHTML) the DTD defines it to be ok, but I had thought > that the XML spec made whitespace significant - so it can't be trimmed in > the general case. Please do correct me if I've misunderstood. You're understanding is correct. I was thinking of cases where a developer knew up front what kinds of documents they were parsing and so they themselves turned on whitespace-trimming mode, fully aware of the consequences. Any whitespace trimming technique should be used with extreme care. Though for data-centric applications, trimming whitespace could be really useful. e.g. <customer> <name>James</name> <location>UK</location> </customer> If trimming of whitespace-only text nodes was enabled, the above would not have 3 extra Text nodes added. This could only be done safely if the DTD was like this <!ELEMENT customer (name, location) > though it could be enabled by hand if the developer understood what they were doing. > It is of course possible to apply an XSL filter to the input stream to > remove whitespace, though doing this during document reading would be great. It would be easy to turn on as a configurable option if you know that whitespace isn't important. Normally DTDs are used to make that call. James _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com _______________________________________________ dom4j-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dom4j-dev