#36526: bulk_update uses more memory than expected
-------------------------------+--------------------------------------
     Reporter:  Anže Pečar     |                    Owner:  (none)
         Type:  Uncategorized  |                   Status:  new
    Component:  Uncategorized  |                  Version:  5.2
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Unreviewed
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+--------------------------------------
Description changed by Anže Pečar:

Old description:

> I recently tried to update a large number of objects with:
>
> {{{
> things = list(Thing.objects.all()) # A large number of objects e.g. >
> 1_000_000
> Thing.objects.bulk_update(things, ["description"], batch_size=300)
> }}}
>
> The first line above fits into the available memory (~2GB in my case),
> but the second line caused a SIGTERM, even though I had an additional 2GB
> of available memory. This was a bit surprising as I wasn't expecting
> bulk_update to use this much memory since all the objects to update were
> already loaded.
>
> My solution was:
>
> {{{
> for batch in batched(things, 300):
>      Thing.objects.bulk_update(batch, ["description"], batch_size=300)
> }}}
>
> The first example `bulk_update` used 2.8GB of memory, but in the second
> example, it only used 62MB.
>
> [https://github.com/anze3db/django-bulk-update-memory A GitHub repository
> that reproduces the problem with memray results.]
>
> This might be related to https://code.djangoproject.com/ticket/31202, but
> I decided to open a new issue because I wouldn't mind waiting longer for
> bulk_update to complete, but the SIGTERM surprised me.

New description:

 I recently tried to update a large number of objects with:

 {{{
 things = list(Thing.objects.all()) # A large number of objects e.g. >
 1_000_000
 Thing.objects.bulk_update(things, ["description"], batch_size=300)
 }}}

 The first line above fits into the available memory (~2GB in my case), but
 the second line caused a SIGTERM, even though I had an additional 2GB of
 available memory. This was a bit surprising as I wasn't expecting
 bulk_update to use this much memory since all the objects to update were
 already loaded.

 My solution was:

 {{{
 for batch in batched(things, 300):
      Thing.objects.bulk_update(batch, ["description"], batch_size=300)
 }}}

 The first example `bulk_update` used 2.8GB of memory, but in the second
 example, it only used 62MB.

 [https://github.com/anze3db/django-bulk-update-memory A GitHub repository
 that reproduces the problem with memray results.]

 As we can see from the [https://github.com/user-attachments/assets
 /dd0bdcac-682f-4e79-aa25-aa5a4a2e6b9d memray flamegraph] the majority of
 the memory in my example (2.1GB) is used to prepare the when statement for
 all the batches before executing them. If we change this to generate the
 when statement only for the current batch the memory consumption is going
 to be greatly reduced. I'd be happy to contribute this patch unless there
 are concerns on adding more compute between update queries and making the
 transactions longer. Let me know :)

 This might be related to https://code.djangoproject.com/ticket/31202, but
 I decided to open a new issue because I wouldn't mind waiting longer for
 bulk_update to complete, but the SIGTERM surprised me.

--
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36526#comment:2>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/010701984b32f4bb-99ab4457-ff85-4c5d-923b-d8307b5c62c6-000000%40eu-central-1.amazonses.com.

Reply via email to