#35399: Reduce the "Case-When" sequence for a bulk_update when the values for a
certain field are the same.
-------------------------------------+-------------------------------------
               Reporter:  Willem     |          Owner:  nobody
  Van Onsem                          |
                   Type:  New        |         Status:  new
  feature                            |
              Component:  Database   |        Version:  5.0
  layer (models, ORM)                |       Keywords:  db, bulk_update,
               Severity:  Normal     |  case, when
           Triage Stage:             |      Has patch:  1
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 Django's `.bulk_update(..)` seems to work with sequences of `Case-When`
 *per* field and *per* record to update. In other words, if we have three
 items we want to update with `pk` being 1, 4 and 5, and we have a field
 `yn` and the records for pk `1` and `5` have value `y`, whereas the one
 for `pk=4` it is `n`, then this will make a query that looks like:

 {{{
 UPDATE my_table
 SET yn=CASE WHEN id=1 THEN 'y' WHEN id=4 THEN 'n' WHEN id=5 THEN 'y' END
 WHERE id IN (1,4,5)
 }}}

 It thus builds a long sequence of `Case`-`When`s, which is not efficient.

 There are for most databases solutions with a temporary table that is then
 upserted into the main table. The Django ORM could probably use existent
 tools for this where we first make a temporary model, create it with the
 migration model, insert in bulk, then upsert with a query, and finally
 remove the temporary table. But a problem with this is, we might not have
 privileges to create an extra table in that database.

 But a low-hanging optimization is to first *group* the values together
 that we can group together. This can be done with a `defaultdict` that
 maps the value to a list of primary keys for that value. In case the value
 is not hashable, we can fallback on the original case-when logic. But for
 values like strings, this would reduce the query to:

 {{{
 UPDATE my_table
 SET yn=CASE WHEN id IN (1, 5) THEN 'y' WHEN id=4 THEN 'n' END
 WHERE id IN (1,4,5)
 }}}

 That being said, I think bulk_updates should still be done in a more
 efficient manner, perhaps moving it to the engines, since a lot of
 databases allow the creation of temporary tables to make upserts a lot
 more efficient.
-- 
Ticket URL: <https://code.djangoproject.com/ticket/35399>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/0107018f0b9ceb1d-0d14d9d3-bb8e-4f4b-adb3-9eafa6ac02af-000000%40eu-central-1.amazonses.com.

Reply via email to