#31051: Serialization dependency sorting disallows circular references
unneccesarily
------------------------------------------------+------------------------
               Reporter:  Matthijs Kooijman     |          Owner:  nobody
                   Type:  Bug                   |         Status:  new
              Component:  Core (Serialization)  |        Version:  master
               Severity:  Normal                |       Keywords:
           Triage Stage:  Unreviewed            |      Has patch:  0
    Needs documentation:  0                     |    Needs tests:  0
Patch needs improvement:  0                     |  Easy pickings:  0
                  UI/UX:  0                     |
------------------------------------------------+------------------------
 The `core.serialization.sort_dependencies()` function takes a list of apps
 and/or models, and resolves this into a sorted flat list of models, ready
 to be serialized in that order. This function is intended to make natural
 foreign keys work, by serializing models referenced by a natural key
 before the referencing models. When deserializing, this guarantees that
 natural key references can be resolved, because there are no "forward
 references". Furthermore, when a circular reference using natural keys is
 present, this function raises an exception (e.g. "Can't resolve
 dependencies for some_app.SomeModel in serialized app list") and prevents
 serialization from completing, since there is no way to guarantee a model
 ordering that will have no forward references.

 Note that this ordering is *only* needed when natural keys are involved,
 since data is intended to be loaded in a transaction without constraint
 checks, so numerical foreign keys can be added in the wrong order, as long
 as all referenced data is present at the end of the transaction. This does
 not work with natural keys, since those are resolved by Python code that
 needs the referenced objects present in the database to resolve them.

 However, this sorting is not actually strictly necessary in all cases
 where it is applied. When circular references are involved, this then
 actually prevents serialization for no good reason. In particular, this is
 the case:
  - When running `dumpdata` without natural keys enabled (which is the
 default). Even though natural keys might be defined in the models (which
 causes the sorting and exception), no natural keys will be present in the
 dumped data, so no ordering is needed.
  - When dumping data intended for loading with `loaddata` (which I think
 is the primary usecase for `dumpdata`?). `loaddata` will (since 17 months
 ago in v2.2, see #26291) automatically handle forward references by
 deferring setting fields that reference natural keys that are not added
 yet. In this case, sorting is still useful, to prevent forward references
 where possible, but when there are circular references, it is acceptable
 to ignore some dependencies rather than preventing serialization from
 happening alltogether.
  - When serializing data for tests for `serialized_rollback=True` (in
 `django.db.backends.base.creation.create_test_db`). This is a
 serialization that does not use natural keys, so no ordering is needed at
 all. Note that this serialization happens always (unlike deserialization
 only happens with `serialized_rollback=True`), so AFAIU this effectively
 prevents *any* tests from working on a database with circular references
 with natural keys defined.

 The fix for these issues seems to be rather simple:
  - For dumpdata without `use_natural_foreign_keys`, skip the ordering and
 just serialize all models in arbitrary order. AFAICS
 `use_natural_primary_keys` is not relevant here, since that only controls
 omitting the numerical primary key.
  - For dumpdata *with* `use_natural_foreign_keys`, do the ordering but do
 not bail out when there are circular references (instead just ignore some
 dependencies and produce a best-effort ordering).
  - For test database serialization, also skip the ordering and serialize
 in arbitrary order.

 Note that this would remove two of the three calls to
 `sort_dependencies()` and allow loops in the last remaining instance. This
 means that `sort_dependencies` could be modified to allow loops
 unconditionally, or we could add an argument and default to disallowing
 loops in case any code outside of django is using this function?

 Note that #26552 is a related,  but different issue, concerning the
 *deserialization* of data in testcases.

 I've been working on fixing this and that related issue today and have a
 basic version working, with testcases (which proved to be quite a
 challenge, since testing the test subsystem is a bit tricky...). I'll do
 some additional testing and cleanup and submit a PR soon.


 Also note that the circular-reference exception was already disabled for
 self-referencing models in #16317. The fix for that issue simply ignores
 self-referencing models for sorting, without taking any additional
 measures to sort instances to prevent problems in deserialization (this
 code was added when the deferred deserialization did not exist yet), so I
 wonder how much value this exception still has.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/31051>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/059.eff90a05a808d6cedc5bb9f13d686ad1%40djangoproject.com.

Reply via email to