Thanks for the tip.

On 09/08/2022 17:49, Majewski, Steven Dennis (sdm7g) wrote:
You can also do this maybe more simply in XQuery.
In that case, you may want to remove any whitespace differences on ingest ( or 
else, use normalize-space() in comparisons ) [ In BaseX, there is an option to 
strip whitespace on parse. ]

  Exactly how, depends on how you are defining EQUALs, ( i.e. comparing @lon & 
@lat values alone, or including text in the comparison ) and what you want to do in 
a conflict ( use first or use last or maybe drop both and report )


let $doc := <data>
     <entries>
          <wpt lat="46.98520" lon="6.8831">
             <name>London</name>
         </wpt>
          <wpt lat="46.98520" lon="2.8831">
             <name>Paris</name>
         </wpt>
          <wpt lat="46.98520" lon="-4.8831">
             <name>Manhattan</name>
         </wpt>
         <wpt lat="46.98520" lon="6.8831">
             <name>London 2</name>
         </wpt>
          <wpt lat="46.98520" lon="-4.8831">
             <name>New York</name>
         </wpt>
     </entries>
</data>

for $x in $doc/entries/wpt
where not( $x/@lon =  $x/following-sibling::wpt/@lon and $x/@lat = 
$x/following-sibling::wpt/@lat )
return $x


The above compares @lon & @lat and only includes the last in a conflict:

<wpt lat="46.98520" lon="2.8831"><name>Paris</name></wpt>
<wpt lat="46.98520" lon="6.8831"><name>London</name></wpt>
<wpt lat="46.98520" lon="-4.8831"><name>New York</name></wpt>


Change the “where” condition to compare the complete wpt element and keep the 
first occurrence by changing the comparison to preceding-sibling::

where not( $x =  $x/preceding-sibling::wpt  )

_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to