On Wed, 2008-07-16 at 10:12 -0700, serbaut wrote:
> Consider the following equivalent queries based on the weblog model
> from the documentation:
> 
> Blog.objects.filter(name="myblog").exclude(entry__body_text__contains="blah
> blah")
> Blog.objects.filter(Q(name="myblog") &
> QNot(Q(entry__body_text__contains="blah blah"))) # 0.96
> Blog.objects.filter(Q(name="myblog") &
> ~Q(entry__body_text__contains="blah blah")) # development  version
> 
> In 0.96 the evaluated SQL is (edited for readability):
> 
> SELECT blog_blog.id,blog_blog.name,blog_blog.tagline
> FROM blog_blog
> INNER JOIN blog_entry AS blog_blog__entry ON blog_blog.id =
> blog_blog__entry.blog_id
> WHERE blog_blog.name = "myblog"
> AND (NOT (blog_blog__entry.body_text LIKE "%blah blah%"))
> 
> while in trunk (r7982) it is
> 
> SELECT blog_blog.id, blog_blog.name, blog_blog.tagline
> FROM blog_blog
> WHERE blog_blog.name = "myblog"
> AND (NOT (blog_blog.id IN (SELECT blog_entry.blog_id FROM blog_entry
> WHERE blog_entry.body_text LIKE "%blah blah%")))
> 
> The trunk version will perform a subquery over all blog entries which
> will have a very negative performance impact.
> 
> Lets say "myblog" has 10 entries in a database with 10,000,000
> entries, the first SQL will only examine the 10 entries (assuming
> decent query planner) while the latter will scan the whole database. A
> join with blog_blog.id is missing from the subquery or it needs to be
> rewritten to the 0.96 form. I understand that the code tries to handle
> the generic case but this practical case has to work too.

The important difference, that trumps everything else, is that the 0.96
version gives the wrong answer! It isn't a case of less or more
efficient -- it's the difference between correct and incorrect.

The query you described asks to exclude all blog entries containing a
particular tag. Now consider a blog entry that has two entries. One is
the entry you're interested in excluding and the second entry is
something else. Because that blog has an row in the m2m join table that
does not match the entry you are excluding, that blog will be included
(incorrectly) in the result set. This was a very big bug in 0.96 and
impossible to work around in that code.

There is no way to write that particular exclusion correctly without
using nested subqueries unless you have some arbitrary constraint like
only one tag per blog entry (in which case a many-to-many field is the
wrong choice).

Remember that the query you are writing here is something that returns
(and filters) Blog objects. It's not for excluding individual Entry
objects. It uses the presence or absence of an Entry to filter the
Blogs.

Your case is particularly pessimal, since it isn't going to be helped by
index matches in most cases. For more natural uses, such as excluding by
pk values, or whole-field matches, adding appropriate index comparisons
makes the inner query very efficient if it becomes a performance issue
in production environments.

Regards,
Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to