Hi Tonny,
Tonny Kohar wrote:
First I don't think you can do whitespace handling in the parser. It's a little hard to know exactly where Batik fits in the XML scheme of things (it's not 100% clear to me if it is an XML processor or an Application) but my reading of the XML spec indicates that Batik can't strip spaces they must be made available through the DOM.
1) Since it is a DOM stuff, not a XML stuff. I am going to rewrite it for DOM normalize method, is this approach correct ?
I don't think so, from the DOM 2 spec:
Puts all Text nodes in the full depth of the sub-tree underneath this Node, including attribute nodes, into a "normal" form where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes.
Note that it does not mention anything about XML space and seems to be pretty clear that the _only_ change is the merging of adjacent Text nodes.
2) When digging the Batik DOM for this stuff, I found AbstractParentNode provide implementation for normalize() however it does not cater for XML space preserve|default. Is this implementation correct?
It looks correct to me according to the DOM specification. It is possible that it should also look to merge CDATA sections.
3) Is there something about xml preserve stuff that I should consider for? Looking through DOM & XML spec from W3C it doesn't say much regarding this matter?
I used the search feature of the W3C site (XML space - I think) and got a direct hit. The important thing is that according to the W3C spec xml:space should have no effect from the XML processor it is there for XML applications to use. As I said it's a little hard to know if Batik is an XML processor or an XML Application, but I think that what ever is using Batik is the application and Batik should play the role of processor as much as possible.
4) Since text node could contains entity, what entity I should consider for? Do I only need to consider the predefined entity in XML spec which is only &, > <, ', " ? How to handle non predefined entity?
I think entities will be expanded by the time you see them (by the XML parser), but they may generate a separate Text Node.
I really think what you are trying to do is flawed at a pretty deep level. For example you can't even get the "right" result for xml:space handling of SVG text elements without knowing about svg:text elements (the elimination of WS at the start and end of the text makes this impossible).
Can you say why you want to do this? Perhaps I can suggest a less problematic way to get what you want.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]