On Thu, 2021-06-03 at 10:19 +0200, Wouter De Borger wrote: > Hi Adam, > Some more news on this: > I ran valgrind, and it is indeed as you suggest: the memory is not > freed to the OS. > One of my colleagues found this: http://xmlsoft.org/xmlmem.html > > You may encounter that your process using libxml2 does not have a > > reduced memory usage although you freed the trees. This is because > > libxml2 allocates memory in a number of small chunks. When freeing > > one of those chunks, the OS may decide that giving this little > > memory back to the kernel will cause too much overhead and delay > > the operation. As all chunks are this small, they get actually > > freed but not returned to the kernel. On systems using glibc, there > > is a function call "malloc_trim" from malloc.h which does this > > missing operation (note that it is allowed to fail). Thus, after > > freeing your tree you may simply try "malloc_trim(0);" to really > > get the memory back. If your OS does not provide malloc_trim, try > > searching for a similar function.
With huge address spaces (64 bit) the question is if it is worth the bother. The current implementation is as it is because the answer is rarely "yes". I maintain a workflow engine in Python - which regularly uses LXML to grind 4GB XML files - it runs the chains of actions in a subprocesses, when the subprocess is complete all the memory is released to the OS. In the meantime if there is actual memory pressure the OS can paghe (swap) - if there are lots of unused pages they can get shuttled out of RAM, then marked as free when the processes dies. A modern LINUX kernel is ridiculously efficient at this. The Python multiprocess module is excellent for creating schemes of worker processes. > I added this code: > import ctypes > def trim_memory() -> int: > libc = ctypes.CDLL("libc.so.6") > return libc.malloc_trim(0) > This seems to fix it! > Perhaps it would be good if lxml would do this by default? It wouldn't be portable, which is likely an argument against it. I'd assume also that it could take some time in a synchronous fashion. -- Adam Tauno Williams <mailto:awill...@whitemice.org> GPG D95ED383 OpenGroupware Developer <http://www.opengroupware.us/> _______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: arch...@mail-archive.com