junos-conf-root.xml <https://drive.google.com/file/d/1mFGxoExLIE7DopNx3uHGdHvQsqHBPFAn/view?usp=drive_web> Hi All,
I'm chasing an elusive memory leak and it might be related to lxml. I hope you can help me to understand it better. When I parse a large XML file, and let it get garbage collected, memory is not freed up: E.g. when I run following code: import logging import psutil import os import humanize import gc LOGGER = logging.getLogger(__name__) def get_memory_usage(process: psutil.Process) -> int: with process.oneshot(): return process.memory_full_info().data def log_mem_diff(process: psutil.Process, message: str) -> int: usage = get_memory_usage(process) LOGGER.error(f"{message}: {humanize.naturalsize(usage)}") return usage process = psutil.Process(os.getpid()) import xml.etree as etree import xml.etree.ElementTree def build_tree(xml): tree = etree.ElementTree.fromstring(xml) log_mem_diff(process, "In_scope") # tree goes out of scope here # import lxml.etree as etree # def build_tree(xml): # parser = etree.XMLParser(remove_blank_text=True, collect_ids=False) # tree = etree.XML(xml, parser) # log_mem_diff(process, "In_scope") with open("junos-conf-root.xml", "r") as f: xml = f.read() for i in range(0, 5): build_tree(xml) log_mem_diff(process, "before gc") gc.collect() log_mem_diff(process, "after gc") I get In_scope: 1.4 GB before gc: 1.4 GB after gc: 1.4 GB In_scope: 1.7 GB before gc: 1.7 GB after gc: 1.7 GB In_scope: 1.7 GB before gc: 1.7 GB after gc: 1.7 GB In_scope: 1.7 GB before gc: 1.7 GB after gc: 1.7 GB In_scope: 1.7 GB before gc: 1.7 GB after gc: 1.7 GB This is not a leak per-se, but it behaves unexpectedly in that 1. memory usage goes up 2. running the GC doesn't reduce it 2. running the code again, it doesn't keep going up. I'm trying to understand this behavior. Could you be of assistance in this? Python : sys.version_info(major=3, minor=8, micro=9, releaselevel='final', serial=0) lxml.etree : (4, 6, 3, 0) libxml used : (2, 9, 10) libxml compiled : (2, 9, 10) libxslt used : (1, 1, 34) libxslt compiled : (1, 1, 34) Wouter -- Wouter De Borger Chief Architect Inmanta +32479474994 <0479474994> wouter.debor...@inmanta.com www.inmanta.com Kapeldreef 60, 3001 Heverlee [image: twitter] <https://twitter.com/wdeborger> [image: linkedin] <https://www.linkedin.com/in/wouter-de-borger-a720507/>
_______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: arch...@mail-archive.com