I solved the problem and am responding to myself for the benifit of future 
googlers.
The sax parsers my split nodes of type CHARACTERS into multiple nodes so they 
have to be joined back together. Since pulldom depends on a sax parser it also 
may do this.  My method to find and join together the next CHARACTERS node is 
below. It assumes that
self.event,self.node  = iter.next()
was executed previously.

    def getCharacterNode(self,iter):
        while self.event != 'CHARACTERS':
            self.event,self.node  = iter.next()
        chars=[]
        chars.append(self.node.nodeValue)
        self.event,self.node  = iter.next()
        while self.event == 'CHARACTERS':
            chars.append(self.node.nodeValue)
            self.event,self.node  = iter.next()
        return ''.join(chars)

Cheers,
Grant

I am having a problem with only getting part of characters in CHARACTERS node.
I am using code like this

doc = xml.dom.pulldom.parse(inFile)
iter=iter(doc)
event,node  = iter.next()
if event == 'CHARACTERS':
     char =self.node.nodeValue

In my small tests it works fine but with a large file (2MB) errors start 
occuring.
XML like

<key>Name</key>

sometimes produces char== 'N' or 'Na' where and what it produces varies if I remove some 
nodes at the begining of the file. the nodes I remove seem parse fine but which later node 
parses wrong changes.  I though maybe it was related a buffering problem but this only a 4 
character string. I tried changing the buffering to line buffering-- parse(inFile,None,1) 
--as the phrase <key>Name</key> always occurs on one line, this had no affect.
I tried this with both python 2.3.5 and 2.4 I have not installed pyXML

Any suggestions would be appreciated.

Cheers,
Grant



_______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig

Reply via email to