An opportunity to work in Python, and the necessity of working with some XML
too large to visualize, got me thinking about an answer Alan Gauld had written
to me a few years ago
(https://mail.python.org/pipermail/tutor/2015-June/105810.html). I have
applied that information in this script, but I have another question :)
Let's say I have an xml file like this:
-------------- order.xml ----------------
<salesorder>
<customername>Bob</customername>
<customerlocation>321 Main St</customerlocation>
<saleslines>
<salesline>
<item>D20</item>
<quantity>4</quantity>
</salesline>
<salesline>
<item>CS211</item>
<quantity>1</quantity>
</salesline>
<salesline>
<item>BL5</item>
<quantity>7</quantity>
</salesline>
<salesline>
<item>AC400</item>
<quantity>1</quantity>
</salesline>
</saleslines>
</salesorder>
---------- end order.xml ----------------
Items CS211 and AC400 are not valid items, and I want to remove their
<salesline> nodes. I came up with the following (python 3.6.7 on linux):
------------ xml_delete_test.py --------------------
import os
import xml.etree.ElementTree as ET
hd = os.path.expanduser('~')
inputxml = os.path.join(hd,'order.xml')
outputxml = os.path.join(hd,'fixed_order.xml')
valid_items = ['D20','BL5']
tree = ET.parse(inputxml)
root = tree.getroot()
saleslines = root.find('saleslines').findall('salesline')
for e in saleslines[:]:
if e.find('item').text not in valid_items:
saleslines.remove(e)
tree.write(outputxml)
---------- end xml_delete_test.py ------------------
The above code runs without error, but simply writes the original file to disk.
The desired output would be:
-------------- fixed_order.xml ----------------
<salesorder>
<customername>Bob</customername>
<customerlocation>321 Main St</customerlocation>
<saleslines>
<salesline>
<item>D20</item>
<quantity>4</quantity>
</salesline>
<salesline>
<item>BL5</item>
<quantity>7</quantity>
</salesline>
</saleslines>
</salesorder>
---------- end fixed_order.xml ----------------
What I find particularly confusing about the problem is that after running
xml_delete_test.py in the Idle editor, if I go over to the shell and type
saleslines, I can see that it's now a list of two elements. I run the
following:
for i in saleslines:
print(i.find('item').text)
and I see that it's D20 and BL5, my two valid items. Yet when I write tree out
to the disk, it has the original four. Do I need to refresh tree somehow?
Thanks!
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor