On Fri, Nov 5, 2010 at 1:27 PM, Carl Meyer <[email protected]> wrote:
> Hi all,
>
> The patch for #7539 [1] (support for on_delete behaviors other than
> cascading deletes) has reached a state where I consider it a candidate
> for committing; I invite review and comments. The current patch is
> available as a github branch; the compare view [2] presents it nicely
> as a diff with trunk. The patch is mostly the (excellent) work of
> Johannes Dollinger, with some additional work by myself and Alex
> Gaynor.

Giving credit where it's due, I think Michael Glassford also deserves
a shoutout here. He was the original driving force behind this ticket,
as was actively involved during the 1.2 development cycle.

> Quick feature summary: the patch adds an on_delete argument to
> ForeignKey (and, by extension, OneToOneField). This argument can be
> set to any of the following:
>
> - CASCADE -- the current behavior
> - PROTECT -- raise IntegrityError on attempts to delete the referenced
> object
> - SET_NULL -- (if the FK is nullable)
> - SET_DEFAULT -- (if the FK has a default)
> - SET(value) -- set to some arbitrary value; also accepts a callable
> - DO_NOTHING -- let the database handle it (up to you to create/alter
> your tables with appropriate ON DELETE constraints)
>
> The patch does not add on_delete support to GenericForeignKey; it
> retains the current (and documented) behavior, where models with a
> GenericRelation back to the GFK model will cascade deletes.

I can happily live with this limitation. Generic Keys are a special
case, and as you rightly point out, they're a bit of a kludge as
currently implemented.

> To deal with the root
> problem, the ORM needs to gain a sensible public abstraction for
> virtual or composite fields, which GFKs can become an instance of.

I'm in complete agreement on this point. This might make an
interesting project for the 1.4 timeframe, especially if it can be
tied in with denormalized fields, which is another long standing
feature request that require similar infrastructure.

> In addition to adding the on_delete feature, the patch simplifies and
> clarifies the deletion code (~50 line net reduction in non-test code),
> adds test coverage (+200 lines test code), and is much faster than the
> previous code for large bulk deletes (fixes #13067 [3]). It also adds
> documentation in the QuerySet API reference for QuerySet.delete (fixes
> #14599 [4]).
>
> I've run the full test suite under Python 2.6 on SQLite and Postgres
> (8.4), and deletion-related tests (delete, on_delete, delete_regress,
> admin_views.AdminViewDeletedObjectsTest) on MySQL/ISAM. I've also run
> the tests against Python 2.4 and 2.5 (SQLite). Test runs against other
> database backends would be very helpful.

You can add coverage of MySQL/MyISAM under Python 2.6 to your list --
Testing on_delete delete_regress invalid_models admin_util yields 22
tests, 1 skip, 21 passes. I've also confirmed the PostgreSQL and
SQLite results under Python 2.6.

> I'd like to commit this early next week, depending on feedback.

I've just done a teardown. Summary: 42 flavors of win.

Seriously, this is some great stuff. I'll admit I knew you had been
working on this Carl, but I've been shying away from reviewing the
patch because I was expecting it to be massively complex. It turns out
to be the exact opposite. It's not often you can make an
implementation much simpler while massively boosting the feature set
and making it faster, but this patch manages to do just that. Kudos to
all involved.

A couple of really minor issues that I found during my review:

 * A minor optimization in SET() -- the "if callable(value)" check can
be done as part of the factory function, rather than being repeated
every time the generated SET function is invoked (i.e., no need to
check if None is callable every time). It isn't a big overhead, but it
has the potential to be a hotspot because it will be called a lot, so
every optimization is worth it.

 * The documentation for queryset.delete() doesn't quite read right to
me. It seems to imply that queryset.delete() implements ON DELETE
CASCADE by default, when it's really foreign keys that have the
default cascading behavior. I don't have any problem saying that
CASCADE is the default, but I'd like the docs to make more of the fact
that it's the foreign key that actually controls the behavior.

 * By my reading, the documentation for the on_delete attribute has
the relationship described the wrong way around. on_delete is really
answer the question "What happens to this object when the object I'm
pointing to is deleted?"; as currently written, it feels more like
"What happens to the other object when this object is deleted". For
example, the description of PROTECT is "Prevent deletion of the
referenced object by raising IntegrityError" -- to me the 'referenced
object' is the object I'm pointing to, not myself.

 * The fact that GenericForeignKeys *don't* have the ability to
customize deletion behavior is probably worth a documentation note.
It's the sort of thing that would be an obvious inclusion from a pure
API perspective.

 * modeltests/delete/tests.py seems to have preserved the
setup/teardown and utility functions, but no actual tests. As far as I
can make out, modeltests/delete is entirely vestigial. I would prefer
to see the on_delete tests moved into the delete directory, since
they're now the core tests for deletion functionality in Django.

 * I don't see a test case where deletion of inherited objects is
tested. I can't see a test case in the existing test suite either,
which is an interesting oversight. It's implicitly tested by virtue of
the OneToOne test, but I would feel more comfortable seeing an
explicit test case -- especially given that it's an specific piece of
branch coverage in the collect() method.

 * I found two other tickets in a review of ticket history: #6108,
#6870. Both relate to the way signals are used in deletion (especially
the pre-delete signal). Reading the tickets and the django-dev
archive, my gut tells me that we should probably close these two
tickets -- they both appear to be alternate ways of achieving ON
DELETE behaviors, except that they solutions they propose would be
much harder to integrate with admin, or otherwise audit for
completeness. Are there any alternate opinions on this?

The only other issue that occurred to me during my review is a huge
topic, but one that should in no way hold up this patch landing in
trunk. However, I think it deserves a quick cursory examination to
make sure we're not painting ourselves into a corner.

It's the question of native deletes. Postgres and Oracle (and I think
SQLite in recent versions) allow for a database-native implementation
of cascaded deletes, accompanied by database level definitions of the
cascade. This has the potential to be a serious performance boost for
large deletions.

Looking at this patch, my gut tells me that implementing native
deletes will actually be fairly simple. It's mostly just a matter of
setting up the right database table definitions, and invoking a native
deletion instead of calling delete() on the Collector instance.

(And by the by -- doing a detailed review of a class called Collector
has been giving me all sorts of shivers; I studied John Fowles' "The
Collector" as a final year English Literature text at high school, and
I still have the scars... :-)

My concern is that I can see a potential complication with SET_DEFAULT
and other SET() variants which require database-side interpretations.
Django doesn't currently expose default values at the database level.
Defaults are entirely handled on the Python side. This means that we
will either need to provide a way to inject SQL-native defaults, or a
way to perform the "collection" independently of the "update" (or
possibly both).

The reason I raise this is to give quick consideration to whether we
should change the CASCADE et al functions into classes, so that they
can hold richer representations if need be in the future.

The alternative would be to treat this as a YAGNI for now, and just
avoid documenting the fact that CASCADE et al *are* functions. The
current documentation would certainly support the YAGNI approach --
the magic keywords are treated as attributes; the fact that they're
active functions is completely transparent.

If anyone can point at any other obvious problems that might arise in
implementing native deletes, I'd be interested in hearing them.

Again, to be clear -- I am in no way suggesting that the landing of
this patch should be held up by a discussion of native deletes. I'm
really just looking for a rough sanity check from someone else close
to the patch that the patch that is on the table won't act as a major
impediment to implementing native deletion at some point in the
future. I don't think it will, but I'd rather be prudent.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to