On Wed, 2008-07-16 at 10:12 -0700, serbaut wrote: > Consider the following equivalent queries based on the weblog model > from the documentation: > > Blog.objects.filter(name="myblog").exclude(entry__body_text__contains="blah > blah") > Blog.objects.filter(Q(name="myblog") & > QNot(Q(entry__body_text__contains="blah blah"))) # 0.96 > Blog.objects.filter(Q(name="myblog") & > ~Q(entry__body_text__contains="blah blah")) # development version > > In 0.96 the evaluated SQL is (edited for readability): > > SELECT blog_blog.id,blog_blog.name,blog_blog.tagline > FROM blog_blog > INNER JOIN blog_entry AS blog_blog__entry ON blog_blog.id = > blog_blog__entry.blog_id > WHERE blog_blog.name = "myblog" > AND (NOT (blog_blog__entry.body_text LIKE "%blah blah%")) > > while in trunk (r7982) it is > > SELECT blog_blog.id, blog_blog.name, blog_blog.tagline > FROM blog_blog > WHERE blog_blog.name = "myblog" > AND (NOT (blog_blog.id IN (SELECT blog_entry.blog_id FROM blog_entry > WHERE blog_entry.body_text LIKE "%blah blah%"))) > > The trunk version will perform a subquery over all blog entries which > will have a very negative performance impact. > > Lets say "myblog" has 10 entries in a database with 10,000,000 > entries, the first SQL will only examine the 10 entries (assuming > decent query planner) while the latter will scan the whole database. A > join with blog_blog.id is missing from the subquery or it needs to be > rewritten to the 0.96 form. I understand that the code tries to handle > the generic case but this practical case has to work too.
The important difference, that trumps everything else, is that the 0.96 version gives the wrong answer! It isn't a case of less or more efficient -- it's the difference between correct and incorrect. The query you described asks to exclude all blog entries containing a particular tag. Now consider a blog entry that has two entries. One is the entry you're interested in excluding and the second entry is something else. Because that blog has an row in the m2m join table that does not match the entry you are excluding, that blog will be included (incorrectly) in the result set. This was a very big bug in 0.96 and impossible to work around in that code. There is no way to write that particular exclusion correctly without using nested subqueries unless you have some arbitrary constraint like only one tag per blog entry (in which case a many-to-many field is the wrong choice). Remember that the query you are writing here is something that returns (and filters) Blog objects. It's not for excluding individual Entry objects. It uses the presence or absence of an Entry to filter the Blogs. Your case is particularly pessimal, since it isn't going to be helped by index matches in most cases. For more natural uses, such as excluding by pk values, or whole-field matches, adding appropriate index comparisons makes the inner query very efficient if it becomes a performance issue in production environments. Regards, Malcolm --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---