[lxml] Re: Optimizing lxml for Handling Large XML Files: Tips and Experiences

Charlie Clark Wed, 30 Oct 2024 05:32:27 -0700

On 25 Oct 2024, at 10:10, Lily Parker via lxml - The Python XML Toolkitwrote:


Hi Lily,

I've been using lxml to process large XML files recently, and I'mlooking for ways to optimize performance. Specifically, I'm trying tofilter specific nodes from large datasets and manage memory usage moreeffectively. I'd appreciate any tips or best practices from yourexperiences. Are there any techniques you use to enhance performance,or potential pitfalls I should watch out for? Thanks in advance foryour insights!

Can you explain a little more in what you're trying to do? If you'rewanting to manipulate files then you're probably best off combining aniterative, incremental parser with an incremental reader.

This is something I've used recently for fixing broken Excel worksheets,the incorrect "r" attribute needs removing. You should be able to adaptit to your needs.


```python

def parser(sheet_src):
    xml = iterparse(sheet_src)
    for _, element in xml:
        if element.tag == CELL_TAG:
            element.set("r", None)
        yield element

def writer(output):
    with xmlfile(output) as xf:
        try:
            while True:
                el = (yield)
                if el is True:
                    yield xf
                xf.write(el)
        except GeneratorExit:
            pass

def writer(out_stream, in_stream):
    with xmlfile(out_stream) as xf:
        for el in in_stream:
            xf.write(el)

```

--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226

_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

[lxml] Re: Optimizing lxml for Handling Large XML Files: Tips and Experiences

Reply via email to