Hi Robin,

> I am trying to port an xml parser to using lxml.etree.
>
> I can get the parser to work find and get it to validate properly and produce 
> a
> parse tree in tuple form ie (tag, attrib, [contents...], extra)
>
> The standard xml error locations are find and provide both line number and
> column.
>
> However, the current parser allows for debugging during post processing of the
> parse and it has a tuple of information in place of extra above that looks 
> like
> ((srcname,startline,startcolumn),(srcname,endline,endcolumn)). This extra
> information allows post analysis to determine if one tag starts after another.
>
> I can find no way to access the column information in the standard parsers.
>
> I believe that the information is present in the XMLReader that libxml2 
> provides,
> but no way to get access in lxml.

I believe that you're right.

> I think I just need to determine a tag ordering ie does tag0 start before or 
> after
> tag1 in the source.
>
> Is there an obvious way to do that?
>
> Currently (tag0.startline, tag0.startcolumn) is compared with (tag1.startline,
> tag1.startcolumn) and the latest tag is returned.
>
> I believe I could just add a tag sequence to determine order, but is there an 
> easier
> way?

I must say I don't quite understand the problem: You're probably using 
event-driven
parsing (https://lxml.de/tutorial.html#parsing-from-strings-and-files) to feed 
into your
parser (on top of lxml's parsing)?

Why would you have the need to compare tag lines and columns, since elements 
(and respective parsing events)
are guaranteed to come in order? Are you concerned with the innermost (opening) 
tag on a certain line, or
what are you looking for?

Best regards,
Holger






Landesbank Baden-Wuerttemberg
Anstalt des oeffentlichen Rechts
Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
HRA 12704
Amtsgericht Stuttgart
HRA 4356, HRA 104 440
Amtsgericht Mannheim
HRA 40687
Amtsgericht Mainz

Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen 
Daten.
Informationen finden Sie unter https://www.lbbw.de/datenschutz.
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to