On Thu, 2021-06-03 at 10:19 +0200, Wouter De Borger wrote:
> Hi Adam,
> Some more news on this:
> I ran valgrind, and it is indeed as you suggest: the memory is not
> freed to the OS.
> One of my colleagues found this: http://xmlsoft.org/xmlmem.html
> > You may encounter that your process using libxml2 does not have a
> > reduced memory usage although you freed the trees. This is because
> > libxml2 allocates memory in a number of small chunks. When freeing
> > one of those chunks, the OS may decide that giving this little
> > memory back to the kernel will cause too much overhead and delay
> > the operation. As all chunks are this small, they get actually
> > freed but not returned to the kernel. On systems using glibc, there
> > is a function call "malloc_trim" from malloc.h which does this
> > missing operation (note that it is allowed to fail). Thus, after
> > freeing your tree you may simply try "malloc_trim(0);" to really
> > get the memory back. If your OS does not provide malloc_trim, try
> > searching for a similar function.

With huge address spaces (64 bit) the question is if it is worth the
bother.  The current implementation is as it is because the answer is
rarely "yes".

I maintain a workflow engine in Python - which regularly uses LXML to
grind 4GB XML files - it runs the chains of actions in a subprocesses,
when the subprocess is complete all the memory is released to the OS. 
In the meantime if there is actual memory pressure the OS can paghe
(swap) - if there are lots of unused pages they can get shuttled out of
RAM, then marked as free when the processes dies.  A modern LINUX
kernel is ridiculously efficient at this.

The Python multiprocess module is excellent for creating schemes of
worker processes.

> I added this code:
> import ctypes
> def trim_memory() -> int:
>   libc = ctypes.CDLL("libc.so.6")
>   return libc.malloc_trim(0)
> This seems to fix it!
> Perhaps it would be good if lxml would do this by default?

It wouldn't be portable, which is likely an argument against it.  I'd
assume also that it could take some time in a synchronous fashion.

-- 
Adam Tauno Williams <mailto:awill...@whitemice.org> GPG D95ED383
OpenGroupware Developer <http://www.opengroupware.us/>

_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to