On 25 Oct 2024, at 10:10, Lily Parker via lxml - The Python XML Toolkit wrote:

Hi Lily,

I've been using lxml to process large XML files recently, and I'm looking for ways to optimize performance. Specifically, I'm trying to filter specific nodes from large datasets and manage memory usage more effectively. I'd appreciate any tips or best practices from your experiences. Are there any techniques you use to enhance performance, or potential pitfalls I should watch out for? Thanks in advance for your insights!

Can you explain a little more in what you're trying to do? If you're wanting to manipulate files then you're probably best off combining an iterative, incremental parser with an incremental reader.

This is something I've used recently for fixing broken Excel worksheets, the incorrect "r" attribute needs removing. You should be able to adapt it to your needs.

```python

def parser(sheet_src):
    xml = iterparse(sheet_src)
    for _, element in xml:
        if element.tag == CELL_TAG:
            element.set("r", None)
        yield element

def writer(output):
    with xmlfile(output) as xf:
        try:
            while True:
                el = (yield)
                if el is True:
                    yield xf
                xf.write(el)
        except GeneratorExit:
            pass

def writer(out_stream, in_stream):
    with xmlfile(out_stream) as xf:
        for el in in_stream:
            xf.write(el)

```

--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to