Re: Is cake generating inefficient SQL? (uses IN clause)

SeanW Fri, 14 Dec 2007 08:07:32 -0800

A covering index might also help, depending on what type of field
b.field_2 is.


CREATE INDEX cover_field2 ON b (id,field_2);

If field_2 is a short varchar then there is a greater chance that
mysql will just read the index and grab the data from there, and never
touch the table.

If you need a bunch more fields then the covering technique loses its
effectiveness.

Sean

On Dec 14, 7:09 am, grigri <[EMAIL PROTECTED]> wrote:
> Sorry, I think you're interepreting the results wrong.
>
> The MySQL optimizer is a complicated piece of code, but it generally
> works well.
>
> Basically, if your range query affects a large portion of the table
> entries then it is MORE efficient to do a table scan than an index-
> based range check, so it does. In fact, in older versions of MySQL
> based this decision solely on the approximate percentage:
>
> -----------
> quote fromhttp://dev.mysql.com/doc/refman/5.0/en/where-optimizations.html
> :
>
> "Each table index is queried, and the best index is used unless the
> optimizer believes that it is more efficient to use a table scan. At
> one time, a scan was used based on whether the best index spanned more
> than 30% of the table, but a fixed percentage no longer determines the
> choice between using an index or a scan. The optimizer now is more
> complex and bases its estimate on additional factors such as table
> size, number of rows, and I/O block size."
> ------------
>
> Try running the same query on a table with a larger row count. I ran a
> indexed/ranged query against a table with over 4 million rows (there
> were 100 entries in the IN() statement), and EXPLAIN SELECT confirmed
> that the index was being used, and no full table scan was involved.
>
> On Dec 14, 11:46 am, keymaster <[EMAIL PROTECTED]> wrote:
>
> > If you have two tables A hasMany B, and B belongsTo A:
>
> > $data = $this->A->findAll($where, null,  null, null, null, 1);
>
> > Cake will generate one query for the A records, and then a second
> > query for the B records.
>
> > That's fine.
>
> > Now, the second query is of the form:
>
> > SELECT `B`.`id`, `B`.`field_2` FROM `B` AS `B` WHERE `B`.`a_id` IN
> > (16, 17, 18, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
> > 143, 144, 145, 146, 147, 154, 168, 170, 171, 172, 173, 174, 175, 176,
> > 177, 178, 179, 180, 181, 182, 183, 184, 187, 188, 189, 190, 238, 281,
> > 282, 311, 312, 313, 314, 338, 348)
>
> > The IN clause is a list of the A record id's from the first query.
>
> > Again, fine.
>
> > The problem is, for a large IN clause, even if one creates an index on
> > the B.a_id field, mySQL will not use the index, and will do a full
> > table scan comparing each record in B to the list of record id's in
> > the IN clause.
>
> > Very inefficient.
>
> > For small numbers of A records in the IN clause, B's index on B.a_id
> > will be used. But once you pass a certain nuber of id's in the IN
> > clause, (I had around 60) the index on B.a_id is no longer used, and a
> > full table scan is performed instead.
>
> > Here is a sample EXPLAIN performed on the cake query above:
>
> > id      select_type     table   type    possible_keys   key     key_len     
> >     ref     rows
> > Extra
> > 1       SIMPLE         B        ALL     a_id       NULL  NULL   NULL        
> >     1
> > Using where
>
> > An index exists on B.a_id, yet is not used.
>
> > For small numbers of id's in the IN clause, the index is used.
>
> > The IN clause is very inefficient and can scarily result in full table
> > scans.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Is cake generating inefficient SQL? (uses IN clause)

Reply via email to