#30533: Delete cascade can break postgresql database: too large 'DELETE FROM a
WHERE a.b_id IN (...)'
               Reporter:  Thomas     |          Owner:  nobody
  Riccardi                           |
                   Type:  Bug        |         Status:  new
              Component:  Database   |        Version:  2.0
  layer (models, ORM)                |       Keywords:  queryset delete
               Severity:  Normal     |  cascade batch bulk_batch_size
           Triage Stage:             |  postgresql crash
  Unreviewed                         |      Has patch:  0
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
 == Scenario
 - Model A has a foreign key to model B with on_delete=CASCADE
 - many B objects
 - execute `B.objects.delete()`
 Expected result:
 - deletion works
 Actual result:
 - when there are many B objects to delete (in our case it was 14M rows),
 the database process gets killed before completion:
 2019-05-28 19:17:42.237 CEST
 [1]: [2-1] db=,user= LOG: server process (PID 12911) was terminated by
 signal 9: Killed
 2019-05-28 19:17:42.238 CEST
 [1]: [3-1] db=,user= DETAIL: Failed process was running: DELETE FROM "a"
 WHERE "a"."b_id" IN (17271, 17272, 17273, 17274, 17275, <truncated
 enormous list>
 - with a smaller database it worked (2M rows)

 == Analysis
 It seems the related objects A get deleted in one query with an unbound
 `IN (...)` list of B objects.
 In fact this pattern already lead to an issue with the sqlite backend
 ([#link0]: sqlite supports only 999 parameters per query)
 This was fixed in django 1.8 by adding batch query [#link1], [#link2] with
 a size specified per backend:
 - sqlite: 999 [#link3]
 - oracle: 64k [#link4]
 - all others (base): not limited [#link5]

 == Workaround
 As a temporary workaround we monkey patched the `connection` instance own
 `bulk_batch_size` and limit to 64k.
 import types

 def monkey_patch_connection_bulk_batch_size(connection):
     def limited_bulk_batch_size(self, fields, objs):
         PostgreSQL can crash with too many parameters in a query
         e.g. 'DELETE FROM x WHERE x.y IN (...large list...)'
         => limit to 64k
         return 2**16
     connection.ops.bulk_batch_size =
 types.MethodType(limited_bulk_batch_size, connection.ops)

 It worked great in our case: we used it in a migration.
 workaround limitations:
 - no idea where to monkey patch for global usage
 - no idea how to choose the bulk size value
 - didn't handle the more complex computation using `fields` and `objs`

 (Related remark: this is for deleting the related objects, then for the
 main objects deletion django already uses batch deletion, but with much
 smaller batch size: `GET_ITERATOR_CHUNK_SIZE = 100`; with some latency to
 the database it's a really small value (0.2ms per query))

 == Context
 - postgresql 9.6 (GCP Cloud SQL)
 - django 2.0.13 (but related code has not changed in current master it

 link 0 [=#link0] https://code.djangoproject.com/ticket/16426
 link 1 [=#link1]
 link 2 [=#link2]
 link 3 [=#link3]
 link 4 [=#link4]
 link 5 [=#link5]

Ticket URL: <https://code.djangoproject.com/ticket/30533>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
For more options, visit https://groups.google.com/d/optout.

Reply via email to