I solved the problem and am responding to myself for the benifit of future
googlers.
The sax parsers my split nodes of type CHARACTERS into multiple nodes so they
have to be joined back together. Since pulldom depends on a sax parser it also
may do this. My method to find and join together the next CHARACTERS node is
below. It assumes that
self.event,self.node = iter.next()
was executed previously.
def getCharacterNode(self,iter):
while self.event != 'CHARACTERS':
self.event,self.node = iter.next()
chars=[]
chars.append(self.node.nodeValue)
self.event,self.node = iter.next()
while self.event == 'CHARACTERS':
chars.append(self.node.nodeValue)
self.event,self.node = iter.next()
return ''.join(chars)
Cheers,
Grant
I am having a problem with only getting part of characters in CHARACTERS node.
I am using code like this
doc = xml.dom.pulldom.parse(inFile)
iter=iter(doc)
event,node = iter.next()
if event == 'CHARACTERS':
char =self.node.nodeValue
In my small tests it works fine but with a large file (2MB) errors start
occuring.
XML like
<key>Name</key>
sometimes produces char== 'N' or 'Na' where and what it produces varies if I remove some
nodes at the begining of the file. the nodes I remove seem parse fine but which later node
parses wrong changes. I though maybe it was related a buffering problem but this only a 4
character string. I tried changing the buffering to line buffering-- parse(inFile,None,1)
--as the phrase <key>Name</key> always occurs on one line, this had no affect.
I tried this with both python 2.3.5 and 2.4 I have not installed pyXML
Any suggestions would be appreciated.
Cheers,
Grant
_______________________________________________
XML-SIG maillist - XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig