Hi everyone, Despite the brief respite, the issue my team is having with the element.clear() persists. I honestly have no idea why lxml 2.2.3 can do it instantly while the latest version took ages.
I do wonder about something though; I used sys.getsizeof to see the size of the elements before and after clear, but to my surprise the size remained constant at 56 bytes. In that case what are we clearing? On Fri, 14 Feb 2025, 21:08 Charlie Clark, <charlie.cl...@clark-consulting.eu> wrote: > On 14 Feb 2025, at 11:12, Stefan Behnel via lxml - The Python XML Toolkit > wrote: > > Then you're not cleaning up enough of the XML tree. Some of it remains in > memory after processing it, and thus leads to swapping and long waiting > times. > > It's definitely a memory issue. You can write some code to catch memory > use quickly. This is something we wrote for openpyxl while we trying to > "contain" memory use: > > import osimport openpyxl > from memory_profiler import memory_usage > > def test_memory_use(): > """Naive test that assumes memory use will never be more than 120 % of > that for first 50 rows""" > folder = os.path.split(__file__)[0] > src = os.path.join(folder, "files", "very_large.xlsx") > wb = openpyxl.load_workbook(src, read_only=True) > ws = wb.active > > initial_use = None > > for n, line in enumerate(ws.iter_rows(values_only=True)): > if n % 50 == 0: > use = memory_usage(proc=-1, interval=1)[0] > if initial_use is None: > initial_use = use > assert use/initial_use < 1.2 > print(n, use) > if __name__ == '__main__': > test_memory_use() > > You should be able to adapt this for your parser and it'll tell you soon > enough how far in you get before your memory use balloons. If memory serves > I had one problem where I was clearing in the wrong place, which meant that > other elements were sticking around. Thanks to Stefan for helping me sort > it. I think your code made be too aggressive. It might help to look at the > Openpyxl worksheet parser which has to handle what happens if you do > additional processing within nodes. > > Charlie > > -- > Charlie Clark > Managing Director > Clark Consulting & Research > German Office > Sengelsweg 34 > <https://www.google.com/maps/search/Sengelsweg+34+%0D%0AD%C3%BCsseldorf+%0D%0AD-+40489?entry=gmail&source=g> > Düsseldorf > <https://www.google.com/maps/search/Sengelsweg+34+%0D%0AD%C3%BCsseldorf+%0D%0AD-+40489?entry=gmail&source=g> > D- 40489 > <https://www.google.com/maps/search/Sengelsweg+34+%0D%0AD%C3%BCsseldorf+%0D%0AD-+40489?entry=gmail&source=g> > Tel: +49-203-3925-0390 > Mobile: +49-178-782-6226 > _______________________________________________ > lxml - The Python XML Toolkit mailing list -- lxml@python.org > To unsubscribe send an email to lxml-le...@python.org > https://mail.python.org/mailman3/lists/lxml.python.org/ > Member address: noorulamry.d...@gmail.com >
_______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: arch...@mail-archive.com