Hi Tonny,

Tonny Kohar wrote:
First I don't think you can do whitespace handling in the parser.
It's a little hard to know exactly where Batik fits in the XML scheme
of things (it's not 100% clear to me if it is an XML processor or an
Application) but my reading of the XML spec indicates that Batik can't
strip spaces they must be made available through the DOM.

1) Since it is a DOM stuff, not a XML stuff. I am going to rewrite it for DOM normalize method, is this approach correct ?

I don't think so, from the DOM 2 spec:

        Puts all Text nodes in the full depth of the sub-tree
        underneath this Node, including attribute nodes, into a
        "normal" form where only structure (e.g., elements, comments,
        processing instructions, CDATA sections, and entity
        references) separates Text nodes, i.e., there are neither
        adjacent Text nodes nor empty Text nodes.

   Note that it does not mention anything about XML space and
seems to be pretty clear that the _only_ change is the merging
of adjacent Text nodes.

2) When digging the Batik DOM for this stuff, I found AbstractParentNode
provide implementation for normalize() however it does not cater for XML
space preserve|default. Is this implementation correct?

It looks correct to me according to the DOM specification. It is possible that it should also look to merge CDATA sections.

3) Is there something about xml preserve stuff that I should consider
for? Looking through DOM & XML spec from W3C it doesn't say much
regarding this matter?

I used the search feature of the W3C site (XML space - I think) and got a direct hit. The important thing is that according to the W3C spec xml:space should have no effect from the XML processor it is there for XML applications to use. As I said it's a little hard to know if Batik is an XML processor or an XML Application, but I think that what ever is using Batik is the application and Batik should play the role of processor as much as possible.

4) Since text node could contains entity, what entity I should consider
for? Do I only need to consider the predefined entity in XML spec which
is only &, > <, ', " ? How to handle non predefined
entity?

I think entities will be expanded by the time you see them (by the XML parser), but they may generate a separate Text Node.

   I really think what you are trying to do is flawed at a pretty deep
level. For example you can't even get the "right" result for xml:space
handling of SVG text elements without knowing about svg:text elements
(the elimination of WS at the start and end of the text makes
this impossible).

   Can you say why you want to do this?  Perhaps I can suggest a less
problematic way to get what you want.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to