Re: [sqlite] Slow query, with correlated sub-sub-query

2017-07-07 Thread E . Pasma


Keith, this definitely explains the observed time as it is relative to  
count(a)*count (ab)**2, thus non-linear.

And a correlated sub-query is generally recalculated for each row.
But I do not agree with everything.
In my example it is correlated to the outermost query, and not to the  
sub-query in which it occurs.
Theoretically the optimizer can take this into account and only  
recalculate for each row in the outermost query. And if I'm not  
mistaken Postgress does so. Below is a version modified for pgsql that  
runs fast no matter the number of rows.
Thanks for the suggested change, where the minimum size is computed is  
a sub-query (not sub-sub) and joined to the other sub-query. This is  
so elegant. I still need to compare the timing to David's version and  
use the fastest.


/* sudo -u postgres psql < issue2p.sql */
drop table if exists a
;
drop table if exists ab
;
create table a (a int, primary key (a))
;
create table ab (a int, b int, size int, primary key (a,b))
;
insert into a
with recursive i as (select 1 as i union all select i+1 from i where  
i<3)

select i from i
;
insert into ab
with recursive i as (select 1 as i union all select i+1 from i where  
i<1)

select a, i as b, (a+i)%10 as size from a, i
;
select  a,
(
select  count(*)
fromab
where   a=a.a
and size=(select min(size) from ab where a=a.a)
)
froma
;

Keith Medcalf wrote:
Well of course.  You are aware that a correlated subquery means "for  
each candidate result execute the query"?


So as you have formulated the query it means:

for each row in a
compute the result count which
 for each ab candidate row
 calculate whether it is the minimum

which means that the you have requested that the same result be  
computed many times over.  You have requested exampination of  
count(a) * count(ab) * count(ab) rows.


Instead you should be computing the min(size) for each group of a  
once, and using that value in the correlated subquery


select a.a,
   (
   select count(*)
 from ab
where a == a.a
  and size == b.size
   ) as acount
  from a,
   (
   select a,
  min(size) as size
 from ab
 group by a
) as b
where a.a == b.a;

This will result in scanning count(ab) + count(a) * count(ab) rows.   
Which is significantly less.  On my computer it reduces the  
execution time of the original query you posited from 400 ticks to  
less than 1 tick (ie, from 500 ms to <8 ms)


I do not know if any optimizer can flatten you original query to any  
significant degree.  Some optimizers may arrive at my fixed up query  
because they are capable of doing a hash table lookup on the result  
of the innermost correlate.  SQLite does not do that, and without  
that capability I do not think there is a relational database query  
optimizer on the planet that can help you.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow query, with correlated sub-sub-query

2017-07-07 Thread Keith Medcalf

Well of course.  You are aware that a correlated subquery means "for each 
candidate result execute the query"?

So as you have formulated the query it means:

for each row in a
compute the result count which
 for each ab candidate row
 calculate whether it is the minimum

which means that the you have requested that the same result be computed many 
times over.  You have requested exampination of count(a) * count(ab) * 
count(ab) rows. 

Instead you should be computing the min(size) for each group of a once, and 
using that value in the correlated subquery

select a.a,
   (
   select count(*)
 from ab
where a == a.a
  and size == b.size
   ) as acount
  from a,
   (
   select a,
  min(size) as size
 from ab
 group by a
) as b
where a.a == b.a;

This will result in scanning count(ab) + count(a) * count(ab) rows.  Which is 
significantly less.  On my computer it reduces the execution time of the 
original query you posited from 400 ticks to less than 1 tick (ie, from 500 ms 
to <8 ms)

I do not know if any optimizer can flatten you original query to any 
significant degree.  Some optimizers may arrive at my fixed up query because 
they are capable of doing a hash table lookup on the result of the innermost 
correlate.  SQLite does not do that, and without that capability I do not think 
there is a relational database query optimizer on the planet that can help you.

-- 
˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of E.Pasma
> Sent: Friday, 7 July, 2017 07:47
> To: SQLite mailing list
> Subject: [sqlite] Slow query, with correlated sub-sub-query
> 
> Hello, below is a theoretical query that becomes slow when the number
> of rows increases. What it does is:
> - scan input cases in table a
> - for each input case:
> -- determine the smallest value of attribute size of elements in table
> ab
> -- count the number of elements having this smallest size
> With 3 rows in table a and 3*1000 in ab this takes already several
> seconds.
> I'm not so much interested in an alternative solution, though
> interesting, and merely want to show an inefficient construction. That
> is a sub-sub-query correlated directly to the main query.
> Thanks, E. Pasma
> 
> .version
> SQLite 3.19.3 2017-06-08 14:26:17 ...
> 
> create table a (a, primary key (a))
> ;
> create table ab (a, b, size, primary key (a,b))
> ;
> insert into a
> with i as (select 1 as i union all select i+1 from i where i<3)
> select i from i
> ;
> insert into ab
> with i as (select 1 as i union all select i+1 from i where i<1000)
> select a, i as b, random()%10 as size from a, i
> ;
> .eqp on
> .timer on
> select  a,
> (
> select  count(*)
> fromab
> where   a=a.a
> and size=(select min(size) from ab where a=a.a)
> )
> froma
> ;
> --EQP-- 0,0,0,SCAN TABLE a
> --EQP-- 0,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 1
> --EQP-- 1,0,0,SEARCH TABLE ab USING INDEX sqlite_autoindex_ab_1 (a=?)
> --EQP-- 1,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 2
> --EQP-- 2,0,0,SEARCH TABLE ab USING INDEX sqlite_autoindex_ab_1 (a=?)
> 1|56
> 2|53
> 3|49
> Run Time: real 2.678 user 2.597794 sys 0.008801
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow query, with correlated sub-sub-query

2017-07-07 Thread David Raymond
Also of note is that when you make an index on ab (size), your original query, 
unchanged, becomes about 3 times faster than my modification. I'm not sure on 
what it looks at to decide if a temporary autoindex is worth it, but in this 
case it would have been.

When you move back to your more complex version I don't know if an expression 
index would be practical though, you'll have to play around.


-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of E.Pasma
Sent: Friday, July 07, 2017 1:12 PM
To: SQLite mailing list
Subject: [sqlite] Slow query, with correlated sub-sub-query

Thanks David for the alernative solution, that is 500 times faster or  
any times faster depending on the number of rows. I hope this does not  
derive attention from my point that the original construction is not  
very well delt with.
It was only a theoretical query derived from a more complex one. The  
atrribute "size" is in this complex query yet an other sub-query,  
counting rows in yet another table. I need to see if the alternative  
solution is still faster there.
E. Pasma
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow query, with correlated sub-sub-query

2017-07-07 Thread David Raymond
I acknowledge you said you weren't so much interested in an alternative 
solution, but...

How about something like

select a, min(size) as minSize, recCount
from
  (select a, size, count(*) as recCount
  from a inner join ab
  using (a)
  group by a, size)
group by a;

The inner one will group by a and size, then the outer group by with the min() 
will pick the minimum and use that line to populate the bare column of recCount.

With 10,000 here's your original:

sqlite> select a, (select count(*) from ab where a = a.a and size = (select 
min(size) from ab where a = a.a)) from a;
--EQP-- 0,0,0,SCAN TABLE a
--EQP-- 0,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 1
--EQP-- 1,0,0,SEARCH TABLE ab USING INDEX sqlite_autoindex_ab_1 (a=?)
--EQP-- 1,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 2
--EQP-- 2,0,0,SEARCH TABLE ab USING INDEX sqlite_autoindex_ab_1 (a=?)
a|(select count(*) from ab where a = a.a and size = (select min(size) from ab 
where a = a.a))
1|522
2|486
3|500
Memory Used: 975336 (max 1508448) bytes
Number of Outstanding Allocations:   270 (max 326)
Number of Pcache Overflow Bytes: 850880 (max 986272) bytes
Number of Scratch Overflow Bytes:0 (max 12472) bytes
Largest Allocation:  524288 bytes
Largest Pcache Allocation:   4256 bytes
Largest Scratch Allocation:  12472 bytes
Lookaside Slots Used:35 (max 100)
Successful lookaside attempts:   71296
Lookaside failures due to size:  19
Lookaside failures due to OOM:   119
Pager Heap Usage:844920 bytes
Page cache hits: 2030205
Page cache misses:   0
Page cache writes:   0
Schema Heap Usage:   1472 bytes
Statement Heap/Lookaside Usage:  32400 bytes
Fullscan Steps:  2
Sort Operations: 0
Autoindex Inserts:   0
Virtual Machine Steps:   1800501558
Run Time: real 39.031 user 38.906649 sys 0.015600


And the alternative:
sqlite> select a, min(size) as minSize, recCount from (select a, size, count(*) 
as recCount from a inner join ab using (a) group by a, size) group by a;
--EQP-- 1,0,0,SCAN TABLE a USING COVERING INDEX sqlite_autoindex_a_1
--EQP-- 1,1,1,SEARCH TABLE ab USING INDEX sqlite_autoindex_ab_1 (a=?)
--EQP-- 1,0,0,USE TEMP B-TREE FOR GROUP BY
--EQP-- 0,0,0,SCAN SUBQUERY 1
--EQP-- 0,0,0,USE TEMP B-TREE FOR GROUP BY
a|minSize|recCount
1|-9|522
2|-9|486
3|-9|500
Memory Used: 984136 (max 1513008) bytes
Number of Outstanding Allocations:   280 (max 332)
Number of Pcache Overflow Bytes: 855136 (max 986272) bytes
Number of Scratch Overflow Bytes:0 (max 12472) bytes
Largest Allocation:  524288 bytes
Largest Pcache Allocation:   4256 bytes
Largest Scratch Allocation:  12472 bytes
Lookaside Slots Used:55 (max 100)
Successful lookaside attempts:   102118
Lookaside failures due to size:  26
Lookaside failures due to OOM:   258
Pager Heap Usage:849164 bytes
Page cache hits: 199
Page cache misses:   0
Page cache writes:   0
Schema Heap Usage:   1736 bytes
Statement Heap/Lookaside Usage:  58280 bytes
Fullscan Steps:  2
Sort Operations: 2
Autoindex Inserts:   0
Virtual Machine Steps:   511684
Run Time: real 0.063 user 0.015600 sys 0.00


-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of E.Pasma
Sent: Friday, July 07, 2017 9:47 AM
To: SQLite mailing list
Subject: [sqlite] Slow query, with correlated sub-sub-query

Hello, below is a theoretical query that becomes slow when the number  
of rows increases. What it does is:
- scan input cases in table a
- for each input case:
-- determine the smallest value of attribute size of elements in table  
ab
-- count the number of elements having this smallest size
With 3 rows in table a and 3*1000 in ab this takes already several  
seconds.
I'm not so much interested in an alternative solution, though  
interesting, and merely want to show an inefficient construction. That  
is a sub-sub-query correlated directly to the main query.
Thanks, E. Pasma

.version
SQLite 3.19.3 2017-06-08 14:26:17 ...

create table a (a, primary key (a))
;
create table ab (a, b, size, primary key (a,b))
;
insert into a
with i as (select 1 as i union all select i+1 from i where i<3)
select i from i
;
insert into ab
with i as (select 1 as i union all select i+1 from i where i<1000)
select a, i as b, random()%10 as size from a, i
;
.eqp on
.timer on
select  a,
(
select  count(*)
fromab
where   a=a.a
and size=(select min(size) from ab where a=a.a)
)
froma
;
--EQP-- 0,0,0,SCAN TABLE a
--EQP-- 0,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 1