Keith, this definitely explains the observed time as it is relative to count(a)*count (ab)**2, thus non-linear.
And a correlated sub-query is generally recalculated for each row.
But I do not agree with everything.
In my example it is correlated to the outermost query, and not to the sub-query in which it occurs. Theoretically the optimizer can take this into account and only recalculate for each row in the outermost query. And if I'm not mistaken Postgress does so. Below is a version modified for pgsql that runs fast no matter the number of rows. Thanks for the suggested change, where the minimum size is computed is a sub-query (not sub-sub) and joined to the other sub-query. This is so elegant. I still need to compare the timing to David's version and use the fastest.

/* sudo -u postgres psql < issue2p.sql */
drop table if exists a
;
drop table if exists ab
;
create table a (a int, primary key (a))
;
create table ab (a int, b int, size int, primary key (a,b))
;
insert into a
with recursive i as (select 1 as i union all select i+1 from i where i<3)
select i from i
;
insert into ab
with recursive i as (select 1 as i union all select i+1 from i where i<10000)
select a, i as b, (a+i)%10 as size from a, i
;
select  a,
        (
            select  count(*)
            from    ab
            where   a=a.a
            and     size=(select min(size) from ab where a=a.a)
        )
from    a
;

Keith Medcalf wrote:
Well of course. You are aware that a correlated subquery means "for each candidate result execute the query"?

So as you have formulated the query it means:

for each row in a
    compute the result count which
         for each ab candidate row
             calculate whether it is the minimum

which means that the you have requested that the same result be computed many times over. You have requested exampination of count(a) * count(ab) * count(ab) rows.

Instead you should be computing the min(size) for each group of a once, and using that value in the correlated subquery

select a.a,
       (
           select count(*)
             from ab
            where a == a.a
              and size == b.size
       ) as acount
  from a,
       (
           select a,
                  min(size) as size
             from ab
         group by a
        ) as b
where a.a == b.a;

This will result in scanning count(ab) + count(a) * count(ab) rows. Which is significantly less. On my computer it reduces the execution time of the original query you posited from 400 ticks to less than 1 tick (ie, from 500 ms to <8 ms)

I do not know if any optimizer can flatten you original query to any significant degree. Some optimizers may arrive at my fixed up query because they are capable of doing a hash table lookup on the result of the innermost correlate. SQLite does not do that, and without that capability I do not think there is a relational database query optimizer on the planet that can help you.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to