#31202: Bulk update suffers from poor performance with large numbers of models
and
columns
-------------------------------------+-------------------------------------
Reporter: Tom Forbes | Owner: Tom
Type: | Forbes
Cleanup/optimization | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by jerch):
Some update on COPY FROM:
Did a first playground implementation just to see the effects on
performance, see
https://gist.github.com/jerch/fd0fae0107ce7b153b7540111b2e89ab. The chart
over there shows the mean runtime of 100 runs of x updated records in a
100k table, updating just one integer field per record. The perf tests
were done with plain postgres 14 docker image, with no tweaking of any
settings.
The implementation `copy_update` is not yet optimized for perf or neatly
integrated yet, it is just to get an idea, where the ballpark for COPY
FROM would end up. `bulk_update` is django's current implementation,
`django-bulk-update` is from here: https://github.com/aykut/django-bulk-
update, `fast_update` is my early impl of direct UPDATE FROM VALUES from
above.
Some observations from that:
- `bulk_update` shows much worse runtime behavior than all others (thus
accounting is stopped early)
- `django-bulk-update` can keep up a bit longer, but then shows the same
worse runtime growth (prolly due to the same SQL logic used?). This gets
really worse for updates >5k (not shown).
- `copy_update` has a much higher setup costs (1 to 256 updated records)
- between 4096 to 8192 updates `copy_update` starts to pay off compared to
`fast_update`, at 32k updates it is almost twice as fast
- not shown in the charts: for higher updates counts it keeps gaining
ground (almost being 4 times faster for 1M update records)
- There is something going on between 256-512 updates, as almost all
implementations show a steep jump there (postgres b-tree depth change? did
not investigate that further...)
Some early insights from that:
- As already stated above in an earlier comment, django's `bulk_update` is
currently pretty wasteful, as it even drops far behind `django-bulk-
update`, which uses the same SQL update strategy.
- `fast_update` is the winner in small to medium update counts, up to
~10k.
- `copy_update` starts to shine for update counts >10k.
--
Ticket URL: <https://code.djangoproject.com/ticket/31202#comment:11>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/0107017f59d5943b-d6a06456-132d-47c9-8e4e-4ef988132a87-000000%40eu-central-1.amazonses.com.