#11402: exists() method on QuerySets
---------------------------------------------------+------------------------
Reporter: Alex | Owner: nobody
Status: new | Milestone:
Component: Database layer (models, ORM) | Version: 1.0
Resolution: | Keywords:
Stage: Accepted | Has_patch: 0
Needs_docs: 0 | Needs_tests: 0
Needs_better_patch: 0 |
---------------------------------------------------+------------------------
Comment (by lukeplant):
I agree that 'any' is better than 'exists' - it matches the Python
builtin.
To answer someone's objection that this should be an optimization in
`QuerySet.__nonzero__`:
`__nonzero__` should '''not''' do this optimization trick:
* because it should be consistent with `__len__`, which simply forces
evaluation of the !QuerySet
* because it shouldn't second guess the user.
We did actually have this discussion about `QuerySet.__len__()`, way back
(I can't find it), and with hindsight I'm sure we came to the right
conclusion.
Consider the following two bits of code (ignoring bugs for now):
1) Take len() of a queryset, then use its data
{{{
#!python
options = Options.objects.filter(bar=baz)
choice = None
if len(options) > 1:
print "Options: " + ", ".join(opt.name for opt in options)
else:
choice = options[0]
}}}
2) Take len() of a queryset, then discard its data
{{{
#!python
options = Option.objects.filter(bar=baz)
if len(options) > 1:
print "You've got more than 1!"
else
print "You've got 1 or less!"
}}}
In `QuerySet.__len__`, it's impossible to guess which the user is doing.
So if `__len__` does a .count() as an optimization, sometimes it will be a
pessimization, causing an extra DB hit compared to just evaluating the
query. Exactly the same case can be made for `__nonzero__`.
Explicit is better than implicit. We provide `QuerySet.count()` if the
user '''knows''' that they only want the count, and `QuerySet.any()` if
the user '''knows''' that they only want that. If `__len__` and
`__nonzero__` tried to be 'clever', then implementing snippet 1 in an
efficient way gets harder - you have to wrap with `list()`. And snippet 1
is exactly the way that many templates will be written:
{{{
{% if basketitems %}
You've got {{ basketitems|length }} item(s) in your basket:
{% for item in basketitems %}
...
{% endfor %}
{% endif %}
}}}
If we made `__nonzero__` do the `any()` automatically, and similarly
`__len__`, it would be very hard to avoid having 3 DB hits from within the
above template. But the other way around, it's easy to get optimal
behaviour without having paid any attention to performance. And template
authors should not have to worry about this, but view authors should, and
we should give them the tools to be able to do it explicitly and simply.
Using .count() and .any() isn't so 'pure', but the point is that our
abstraction is not perfect (because we are worrying about optimisation),
and we should manage the leak as best we can. The best way is simply to
add this documentation:
> bool() and len() and iter() force evaluation of the !QuerySet; use
.any() or .count() if you know you don't need all the data.
--
Ticket URL: <http://code.djangoproject.com/ticket/11402#comment:5>
Django <http://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/django-updates?hl=en
-~----------~----~----~----~------~----~------~--~---