Aleksander Slominski wrote:
Peter Hendry wrote:
  
Aleksander Slominski wrote:
    
Peter Hendry wrote:
  
      
More efficient still would be to keep track of the last '<' and
whether it had a '/' after it - then allow returning past '>' if the
last '<' didn't have a '/' (confused? :-) ).
    
        
not sure if it is going to work with CDATA section (they can contain
"unbalanced" XML)
  
      
Why not? It is just an optimization. You can't have CDATA in an end
tag so you should not go past the '>' of the end tag whether there is
CDATA or not.
    
CDATA section can have unbalanced XML so beynd tracking of < and /> you
need to track CDATA sections as well AFACS.

  

This is not true. From previous mails I would assume that a SAX parser is being used. In that you will received startElement and endElement events. It is those that are being matched up. What is in the CDATA doesn't matter. Until you get the endElement that matches the first startElement you will continue to honor read() requests but always return up to the next '>' (optionally optimizing with '</' checking as well). If a CDATA contains a '</' or any unmatched '<' or '>' it doesn't matter as you will read past them on the next read. Within a real end tag (outside cdata) you cannot get any other '>' or '<' so there is no problem.

The optimisation would also have to account for empty tags '<x/>'.

An example,

  <root><x><![[CDATA[<x<t/>///>>>/></x>]]></x></root>

Without optimization, the reads would return the following

    <root>
    <x>
    <![[CDATA[<x<t/>
    ///>
    >
    >
    />
    </x>
    ]]>
    </x>
    </root>

at which point the endElement even would return the depth back to 0 and so it is known it is the end of the document.

With some optimization - return if '>' and '/' has been seen since last '<'

    <root><x><![[CDATA[<x<t/>
    ///>
    >>/>
    </x>
    ]]></x>
    </root>

and again at this point endElement is called and returns the depth to 0 so the end of the document has been reached.

I still don't see the need to track CDATA in this?

Pete

--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to