Re: Is cake generating inefficient SQL? (uses IN clause)

grigri Fri, 14 Dec 2007 05:18:44 -0800

Sorry, I think you're interepreting the results wrong.

The MySQL optimizer is a complicated piece of code, but it generally
works well.


Basically, if your range query affects a large portion of the table
entries then it is MORE efficient to do a table scan than an index-
based range check, so it does. In fact, in older versions of MySQL
based this decision solely on the approximate percentage:

-----------
quote from http://dev.mysql.com/doc/refman/5.0/en/where-optimizations.html
:

"Each table index is queried, and the best index is used unless the
optimizer believes that it is more efficient to use a table scan. At
one time, a scan was used based on whether the best index spanned more
than 30% of the table, but a fixed percentage no longer determines the
choice between using an index or a scan. The optimizer now is more
complex and bases its estimate on additional factors such as table
size, number of rows, and I/O block size."
------------

Try running the same query on a table with a larger row count. I ran a
indexed/ranged query against a table with over 4 million rows (there
were 100 entries in the IN() statement), and EXPLAIN SELECT confirmed
that the index was being used, and no full table scan was involved.

On Dec 14, 11:46 am, keymaster <[EMAIL PROTECTED]> wrote:
> If you have two tables A hasMany B, and B belongsTo A:
>
> $data = $this->A->findAll($where, null,  null, null, null, 1);
>
> Cake will generate one query for the A records, and then a second
> query for the B records.
>
> That's fine.
>
> Now, the second query is of the form:
>
> SELECT `B`.`id`, `B`.`field_2` FROM `B` AS `B` WHERE `B`.`a_id` IN
> (16, 17, 18, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
> 143, 144, 145, 146, 147, 154, 168, 170, 171, 172, 173, 174, 175, 176,
> 177, 178, 179, 180, 181, 182, 183, 184, 187, 188, 189, 190, 238, 281,
> 282, 311, 312, 313, 314, 338, 348)
>
> The IN clause is a list of the A record id's from the first query.
>
> Again, fine.
>
> The problem is, for a large IN clause, even if one creates an index on
> the B.a_id field, mySQL will not use the index, and will do a full
> table scan comparing each record in B to the list of record id's in
> the IN clause.
>
> Very inefficient.
>
> For small numbers of A records in the IN clause, B's index on B.a_id
> will be used. But once you pass a certain nuber of id's in the IN
> clause, (I had around 60) the index on B.a_id is no longer used, and a
> full table scan is performed instead.
>
> Here is a sample EXPLAIN performed on the cake query above:
>
> id      select_type     table   type    possible_keys   key     key_len       
>   ref     rows
> Extra
> 1       SIMPLE         B        ALL     a_id       NULL  NULL   NULL          
>   1
> Using where
>
> An index exists on B.a_id, yet is not used.
>
> For small numbers of id's in the IN clause, the index is used.
>
> The IN clause is very inefficient and can scarily result in full table
> scans.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Is cake generating inefficient SQL? (uses IN clause)

Reply via email to