On 25 Oct 2024, at 10:10, Lily Parker via lxml - The Python XML Toolkit
wrote:
Hi Lily,
I've been using lxml to process large XML files recently, and I'm
looking for ways to optimize performance. Specifically, I'm trying to
filter specific nodes from large datasets and manage memory usage more
effectively. I'd appreciate any tips or best practices from your
experiences. Are there any techniques you use to enhance performance,
or potential pitfalls I should watch out for? Thanks in advance for
your insights!
Can you explain a little more in what you're trying to do? If you're
wanting to manipulate files then you're probably best off combining an
iterative, incremental parser with an incremental reader.
This is something I've used recently for fixing broken Excel worksheets,
the incorrect "r" attribute needs removing. You should be able to adapt
it to your needs.
```python
def parser(sheet_src):
xml = iterparse(sheet_src)
for _, element in xml:
if element.tag == CELL_TAG:
element.set("r", None)
yield element
def writer(output):
with xmlfile(output) as xf:
try:
while True:
el = (yield)
if el is True:
yield xf
xf.write(el)
except GeneratorExit:
pass
def writer(out_stream, in_stream):
with xmlfile(out_stream) as xf:
for el in in_stream:
xf.write(el)
```
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com