Re: [sqlite] Slow query, with correlated sub-sub-query

E . Pasma Fri, 07 Jul 2017 14:42:43 -0700

Keith, this definitely explains the observed time as it is relative tocount(a)*count (ab)**2, thus non-linear.

And a correlated sub-query is generally recalculated for each row.
But I do not agree with everything.

In my example it is correlated to the outermost query, and not to thesub-query in which it occurs.Theoretically the optimizer can take this into account and onlyrecalculate for each row in the outermost query. And if I'm notmistaken Postgress does so. Below is a version modified for pgsql thatruns fast no matter the number of rows.Thanks for the suggested change, where the minimum size is computed isa sub-query (not sub-sub) and joined to the other sub-query. This isso elegant. I still need to compare the timing to David's version anduse the fastest.


/* sudo -u postgres psql < issue2p.sql */
drop table if exists a
;
drop table if exists ab
;
create table a (a int, primary key (a))
;
create table ab (a int, b int, size int, primary key (a,b))
;
insert into a

with recursive i as (select 1 as i union all select i+1 from i wherei<3)

select i from i
;
insert into ab

with recursive i as (select 1 as i union all select i+1 from i wherei<10000)

select a, i as b, (a+i)%10 as size from a, i
;
select  a,
        (
            select  count(*)
            from    ab
            where   a=a.a
            and     size=(select min(size) from ab where a=a.a)
        )
from    a
;

Keith Medcalf wrote:

Well of course. You are aware that a correlated subquery means "foreach candidate result execute the query"?
So as you have formulated the query it means:

for each row in a
    compute the result count which
         for each ab candidate row
             calculate whether it is the minimum
which means that the you have requested that the same result becomputed many times over. You have requested exampination ofcount(a) * count(ab) * count(ab) rows.
Instead you should be computing the min(size) for each group of aonce, and using that value in the correlated subquery
select a.a,
       (
           select count(*)
             from ab
            where a == a.a
              and size == b.size
       ) as acount
  from a,
       (
           select a,
                  min(size) as size
             from ab
         group by a
        ) as b
where a.a == b.a;
This will result in scanning count(ab) + count(a) * count(ab) rows.Which is significantly less. On my computer it reduces theexecution time of the original query you posited from 400 ticks toless than 1 tick (ie, from 500 ms to <8 ms)
I do not know if any optimizer can flatten you original query to anysignificant degree. Some optimizers may arrive at my fixed up querybecause they are capable of doing a hash table lookup on the resultof the innermost correlate. SQLite does not do that, and withoutthat capability I do not think there is a relational database queryoptimizer on the planet that can help you.

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Slow query, with correlated sub-sub-query

Reply via email to