#34356: Memory leak when generating PDFs
-------------------------------------+-------------------------------------
               Reporter:  Robin      |          Owner:  nobody
  (Robert) Thomas                    |
                   Type:  Bug        |         Status:  new
              Component:  Core       |        Version:  4.1
  (Other)                            |       Keywords:  memory memory-leak
               Severity:  Normal     |  pdf weasyprint
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 == Context

 Our app generates a one-page PDF report for users. It contains a few small
 SVG and PNG icons, and 4 big textual tables. The PDF is generated once,
 after which it is put in a storage bucket for subsequent retrieval.

 == Problem

 The app is Django 4.1.6, Weasyprint 57.2, running on Heroku (heroku-22).
 We're not having any issues retrieving previously-generated PDFs, but each
 time it generates a new PDF (filesize 38kb) the app's memory RSS increases
 by 20 - 40mb, as reported by Heroku. This memory usage doesn't go down
 until the server is restarted.

 Unfortunately Heroku doesn't automatically restart the server until both
 memory RSS and swap exceed the 512mb limit, so once RSS is used up we
 start getting a lot of pings about OOM errors and have to manually restart
 it.

 == What we've tried

 Even after removing all images, fonts, and CSS (filesize 32kb) each
 generation still increases the memory RSS by about 17mb.

 If we remove everything from the report template, leaving just <!DOCTYPE
 html><html lang="en"><head><title>Test</title><body></body></html>
 (filesize 863b), each generation increases the memory RSS by about 1.3mb.

 == Reproduce

 I deployed a little test app to show this in action, with a link to the
 source code: https://weasyprint-mem.herokuapp.com/

 You can see that every time a PDF is generated it increases the memory
 usage, although not always consistently. I would expect the data for each
 PDF to be garbage-collected once it has rendered:

 == Related

 I opened a bug ticket about this with Weasyprint
 (https://github.com/Kozea/WeasyPrint/issues/1496). They say that because
 they cannot reproduce this when running just Weasyprint by itself from the
 command-line, the memory leak must be elsewhere in the ecosystem.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/34356>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/010701866f5835c9-486a6f9e-85ac-4705-9f2a-4f0176a07545-000000%40eu-central-1.amazonses.com.

Reply via email to