#31202: Bulk update suffers from poor performance with large numbers of models 
and
columns
-------------------------------------+-------------------------------------
     Reporter:  Tom Forbes           |                    Owner:  Tom
         Type:                       |  Forbes
  Cleanup/optimization               |                   Status:  assigned
    Component:  Database layer       |                  Version:  dev
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by jerch):

 Some update on COPY FROM:

 Did a first playground implementation just to see the effects on
 performance, see
 https://gist.github.com/jerch/fd0fae0107ce7b153b7540111b2e89ab. The chart
 over there shows the mean runtime of 100 runs of x updated records in a
 100k table, updating just one integer field per record. The perf tests
 were done with plain postgres 14 docker image, with no tweaking of any
 settings.

 The implementation `copy_update` is not yet optimized for perf or neatly
 integrated yet, it is just to get an idea, where the ballpark for COPY
 FROM would end up. `bulk_update` is django's current implementation,
 `django-bulk-update` is from here: https://github.com/aykut/django-bulk-
 update, `fast_update` is my early impl of direct UPDATE FROM VALUES from
 above.

 Some observations from that:
 - `bulk_update` shows much worse runtime behavior than all others (thus
 accounting is stopped early)
 - `django-bulk-update` can keep up a bit longer, but then shows the same
 worse runtime growth (prolly due to the same SQL logic used?). This gets
 really worse for updates >5k (not shown).
 - `copy_update` has a much higher setup costs (1 to 256 updated records)
 - between 4096 to 8192 updates `copy_update` starts to pay off compared to
 `fast_update`, at 32k updates it is almost twice as fast
 - not shown in the charts: for higher updates counts it keeps gaining
 ground (almost being 4 times faster for 1M update records)
 - There is something going on between 256-512 updates, as almost all
 implementations show a steep jump there (postgres b-tree depth change? did
 not investigate that further...)

 Some early insights from that:
 - As already stated above in an earlier comment, django's `bulk_update` is
 currently pretty wasteful, as it even drops far behind `django-bulk-
 update`, which uses the same SQL update strategy.
 - `fast_update` is the winner in small to medium update counts, up to
 ~10k.
 - `copy_update` starts to shine for update counts >10k.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/31202#comment:11>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/0107017f59d5943b-d6a06456-132d-47c9-8e4e-4ef988132a87-000000%40eu-central-1.amazonses.com.

Reply via email to