>Hi;
>
>I am using FB 2.5.3. in a fast environment (win8 64bit - QuadCore 3.4 Ghz - 8 
>GB ram).
>
>If I run this query it goes very very fast and returns 84 records:
>
>Preparing query: select distinct "PROC" from "GEST" where ("GEST"."FSAL" 
>between '20080801' and '20080812') order by "PROC"
>Prepare time: 0.007s
>Field #01: GEST.PROC Alias:PROC Type:STRING(10)
>PLAN SORT ((GEST NATURAL))
>
>But if I run this one, it will return 84 records but it needs 10 minutes !!!:
>Preparing query: select "PROC"."PROC" from "PROC" where ("PROC"."PROC" in (
>select distinct "PROC" from "GEST" where ("GEST"."FSAL" between '20080801' and 
>'20080812'))) order by "PROC"
>Prepare time: 0.003s
>Field #01: PROC.PROC Alias:PROC Type:STR ING(10)
>PLAN SORT ((GEST NATURAL))
>PLAN (PROC ORDER PROC_PK)
>
>Executing...
>Done.
>-1258819480 fetches, 0 marks, 28351438 reads, 0 writes.
>0 inserts, 0 updates, 0 deletes, 11439 index, 962254304 seq.
>Delta memory: 25308 bytes.
>Total execution time: 0:10:53 (hh:mm:ss)
>Script execution finished.
>
>PROC is a table with 11400 rows. Even if the query optimizer takes the worse 
>path (analizing one by one) it seems that 10 minutes is too much ... 
>In my opinion, both may run very quick. Why is the second one taking that 
>long?.

Well, your query does tell Firebird to find the DISTINCT possible values of 
PROC in a subset of GEST 11400 times. Sure, an ideal optimizer would have been 
able to see that your subselect wasn't correlated (not referencing the outer 
select) and that you didn't really bother about whether the subselect returned 
distinct records or not. Newer versions of Firebird do try to transform 
IN(<SUBSELECT>) into EXISTS(SELECT...), but I guess that your DISTINCT makes it 
give up. I think someone once told (probably Ann) that one of the early 
InterBase betas (or maybe it was even before that) had an optimizer that always 
chose the quickest way to return a result set. It's only problem was that it 
could use days to find which way was the quickest!

There are several fair ways to reach the result you want:

1) The standard way to write your type of query
select p.PROC from PROC p
where exists(select * from GEST g where g.PROC = p.PROC and g.FSAL between 
'20080801' and '20080812')
order by PROC

2) Quicker than option 1, but the result could differ a bit and you may want to 
add DISTINCT
select p.PROC
from PROC p
join GEST g on p.PROC = g.PROC
where g.FSAL between '20080801' and '20080812'

3) Logically similar to your original query in that you treat things like two 
statements and put them together. A nice way to do things if you construct your 
SQL dynamically.
with tmp(PROC) as
(select distinct PROC from GEST where (FSAL between '20080801' and '20080812')
select p.PROC from PROC p
join tmp g on p.PROC = t.PROC

Myself, I sometimes use IN (<constant values>) (I even used it earlier today), 
but I stopped using IN(<subselect>) after 1 or 2 attempts 15 years ago. I've 
never missed it since.

By the way, your original query can benefit from an index for GEST.FSAL, but 
cannot use an index for PROC.PROC. My second and third suggestion above could 
use indexes for GEST.FSAL and PROC.PROC, whereas my first suggestion only could 
use indexes for GEST.FSAL and GEST.PROC.

HTH,
Set

Reply via email to