On Mon, Jun 8, 2009 at 3:39 PM, Henning
Thielemann<[email protected]> wrote:
> I think you could use the parser as it is and do the name parsing later.
> Due to lazy evaluation both parsers would run in an interleaved way.
>
I've been trying to figure out how to get this to work with lazy
evaluation, but haven't made much headway. Tips? The only way I can
think of to get incremental parsing working is to maintain explicit
state, but I also can't figure out how to achieve this with the
parsers I've tested (HaXml, HXT, hexpat).

Here's a working example of what I'm trying to do, in Python. It reads
XML from stdin, prints events as they are parsed, and will terminate
when the document ends:

##########################

from xml.sax import handler, saxutils, expatreader

class ContentHandler (handler.ContentHandler):
        def __init__ (self):
                self.events = []
                self.level = 0
                
        def startElementNS (self, ns_name, lname, attrs):
                self.events.append (("BEGIN", ns_name, lname, dict (attrs)))
                self.level += 1
                
        def endElementNS (self, ns_name, lname):
                self.events.append (("END", ns_name, lname))
                self.level -= 1
                
        def characters (self, content):
                self.events.append (("TEXT", content))
                
def main ():
        parser = expatreader.ExpatParser ()
        content = ContentHandler ()
        parser.setFeature (handler.feature_namespaces, True)
        parser.setContentHandler (content)
        got_events = False
        while content.level > 0 or (not got_events):
                text = raw_input ("Enter XML:\n")
                parser.feed (text)
                print content.events
                content.events = []
                got_events = True

if __name__ == "__main__": main()

###############################

$ python incremental.py
Enter XML:
<test xmlns="urn:test"><test2><test3>
[('BEGIN', (u'urn:test', u'test'), u'test', {}), ('BEGIN',
(u'urn:test', u'test2'), u'test2', {}), ('BEGIN', (u'urn:test',
u'test3'), u'test3', {})]
Enter XML:
</test3></test2><test2 a="b"/>text content goes here
[('END', (u'urn:test', u'test3'), None), ('END', (u'urn:test',
u'test2'), None), ('BEGIN', (u'urn:test', u'test2'), u'test2', {(None,
u'a'): u'b'}), ('END', (u'urn:test', u'test2'), None), ('TEXT', u'text
content goes here')]
Enter XML:
</test>
[('END', (u'urn:test', u'test'), None)]

#############################

As demonstrated, the parser retains state (namespaces, nesting)
between text inputs. Are there any XML parsers for Haskell that
support this incremental behavior?
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to