[lxml] Re: Performance issues when using element.clear() in Python 3.x

Charlie Clark Fri, 14 Feb 2025 05:09:28 -0800

On 14 Feb 2025, at 11:12, Stefan Behnel via lxml - The Python XMLToolkit wrote:

Then you're not cleaning up enough of the XML tree. Some of it remainsin memory after processing it, and thus leads to swapping and longwaiting times.

It's definitely a memory issue. You can write some code to catch memoryuse quickly. This is something we wrote for openpyxl while we trying to"contain" memory use:



```python
import os
import openpyxl

from memory_profiler import memory_usage


def test_memory_use():

"""Naive test that assumes memory use will never be more than 120 %of

    that for first 50 rows"""
    folder = os.path.split(__file__)[0]
    src = os.path.join(folder, "files", "very_large.xlsx")
    wb = openpyxl.load_workbook(src, read_only=True)
    ws = wb.active

    initial_use = None

    for n, line in enumerate(ws.iter_rows(values_only=True)):
        if n % 50 == 0:
            use = memory_usage(proc=-1, interval=1)[0]
            if initial_use is None:
                initial_use = use
            assert use/initial_use < 1.2
            print(n, use)

if __name__ == '__main__':
    test_memory_use()
```

You should be able to adapt this for your parser and it'll tell you soonenough how far in you get before your memory use balloons. If memoryserves I had one problem where I was clearing in the wrong place, whichmeant that other elements were sticking around. Thanks to Stefan forhelping me sort it. I think your code made be too aggressive. It mighthelp to look at the Openpyxl worksheet parser which has to handle whathappens if you do additional processing within nodes.


Charlie

--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226

_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

[lxml] Re: Performance issues when using element.clear() in Python 3.x

Reply via email to