On Thu, 2010-07-08 at 15:58 -0500, Alex Gaynor wrote:
> On Thu, Jul 8, 2010 at 3:51 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
> > On Mon, 2010-07-05 at 00:59 -0700, Simon Litchfield wrote:
> >> > If you can come up with answers to these points, I might get
> >> > interested. 1 and 2 are fairly trivial; I can think of some obvious
> >> > answers for 3, but 4 is the big problem, and will require some
> >> serious
> >> > research and consideration.
> >>
> >> Well, I'm glad you like the with_hints() approach. Items 1-3 are easy.
> >> Re 4 though, every db does it differently. In practice, most don't
> >> need hints like MySQL does <sigh>, because their query optimisers do a
> >> much better job.
> >
> > The big problem I see is that hints are simply the wrong approach to
> > handling this issue, which I do see as an important one.
> >
> > The SQL optimizer can't work out how you're going to handle the queryset
> > if all you mention is the filter(). SQL being a set-based language the
> > optimizer won't know whether you wish to retrieve 0, 1 or 1 million
> > rows. In many cases it actively avoids using the index for what it
> > thinks will be larger retrievals.
> >
> 
> That's categorically untrue.  One of the major functions of an
> optimizer is too try to figure out the approximate result size so it
> can better establish index vs. data cost.

Perhaps I need to explain some more, since what I've said is correct.
The optimizer does work out the number of rows it thinks it will access;
whether you retrieve all of those rows is a different and important
issue.

For example, if we have a 1 million row table called X with a column
called TwoValues. In TwoValues there are 2 values, value=1 and value=2.
There are 999,999 rows with value=1. An index is built on TwoValues.

If I then issue this query:

SELECT * FROM X WHERE TwoValues = 1;

then the optimizer will deduce that I will access 999,999 rows out of a
million and its best strategy is to avoid using an index. If I issue the
same query for value 2 then it will retrieve 1 row and hence use the
index.

In most cases the application won't want to bring back all 999,999 rows,
though the optimizer doesn't know that unless we tell it. If we assume
that we actually only want 10 rows then the situation is open for change
to this form of SQL

SELECT * FROM X WHERE TwoValues = 1 LIMIT 10;

which will use the index, so demonstrating that the optimizer is
designed to offer an appropriate plan when presented with the full info.

Slicing provides the full information for the use case and a hint should
not be required to allow index use.

(The situation is more complex in the case of parameterised prepared
statements and in the case of stale statistics, though neither case is
important here).

-- 
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to