junos-conf-root.xml
<https://drive.google.com/file/d/1mFGxoExLIE7DopNx3uHGdHvQsqHBPFAn/view?usp=drive_web>
Hi All,

I'm chasing an elusive memory leak and it might be related to lxml.
I hope you can help me to understand it better.

When I parse a large XML file, and let it get garbage collected, memory is
not freed up:
E.g. when I run following code:

import logging
import psutil
import os
import humanize
import gc

LOGGER = logging.getLogger(__name__)

def get_memory_usage(process: psutil.Process) -> int:
with process.oneshot():
return process.memory_full_info().data


def log_mem_diff(process: psutil.Process, message: str) -> int:
usage = get_memory_usage(process)
LOGGER.error(f"{message}: {humanize.naturalsize(usage)}")
return usage

process = psutil.Process(os.getpid())

import xml.etree as etree
import xml.etree.ElementTree
def build_tree(xml):
tree = etree.ElementTree.fromstring(xml)
log_mem_diff(process, "In_scope")
# tree goes out of scope here

# import lxml.etree as etree

# def build_tree(xml):
# parser = etree.XMLParser(remove_blank_text=True, collect_ids=False)
# tree = etree.XML(xml, parser)
# log_mem_diff(process, "In_scope")

with open("junos-conf-root.xml", "r") as f:
xml = f.read()

for i in range(0, 5):
build_tree(xml)
log_mem_diff(process, "before gc")

gc.collect()
log_mem_diff(process, "after gc")




I get

In_scope: 1.4 GB
before gc: 1.4 GB
after gc: 1.4 GB
In_scope: 1.7 GB
before gc: 1.7 GB
after gc: 1.7 GB
In_scope: 1.7 GB
before gc: 1.7 GB
after gc: 1.7 GB
In_scope: 1.7 GB
before gc: 1.7 GB
after gc: 1.7 GB
In_scope: 1.7 GB
before gc: 1.7 GB
after gc: 1.7 GB

This is not a leak per-se, but it behaves unexpectedly in that
1. memory usage goes up
2. running the GC doesn't reduce it
2. running the code again, it doesn't keep going up.

I'm trying to understand this behavior.
Could you be of assistance in this?

Python              : sys.version_info(major=3, minor=8, micro=9,
releaselevel='final', serial=0)
lxml.etree          : (4, 6, 3, 0)
libxml used         : (2, 9, 10)
libxml compiled     : (2, 9, 10)
libxslt used        : (1, 1, 34)
libxslt compiled    : (1, 1, 34)

Wouter

-- 
Wouter De Borger

Chief Architect

Inmanta
+32479474994 <0479474994>
wouter.debor...@inmanta.com
www.inmanta.com
Kapeldreef 60, 3001 Heverlee
[image: twitter] <https://twitter.com/wdeborger>
[image: linkedin] <https://www.linkedin.com/in/wouter-de-borger-a720507/>
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to