Den 17.05.2016 12:20, skrev agharta [email protected] [firebird-support]:
> Hi all, this is my first post and my english is bad, please be patient :-)
>
> I've a problem with a full outer join, let me explain.
>
> 2 tables (100000 rows each), 2 cte, 2 full outer joins, 100 rows output.
> No indexes. Execution took about 40s..... in postgres (i've do a equal
> test comparision)...270ms
>
> I've tested it on FB 2.1.7, 2.5.5 and 3.0.0, OS is virtual 64 bit
> Windows server 2012 8GB ram, Classic Server, Dialect 3, no firebird.conf
> optimization/modify, no ssd, no raid, pagesize 16mb
>
> The test query may be stupid, but it explains exacty the performace problem.
>
> Create table script:
>
> TABLE 1:
> CREATE SEQUENCE GEN_NEW_TABLE_ID;
>
> CREATE TABLE NEW_TABLE (
>       ID      INTEGER NOT NULL,
>       FIELD1  INTEGER,
>       FIELD2  INTEGER,
>       FIELD3  INTEGER,
>       FIELD4  INTEGER,
>       FILED5  INTEGER
> );
>
> ALTER TABLE NEW_TABLE ADD CONSTRAINT PK_NEW_TABLE PRIMARY KEY (ID);
>
> CREATE OR ALTER TRIGGER NEW_TABLE_BI FOR NEW_TABLE
> ACTIVE BEFORE INSERT POSITION 0
> as
> begin
>       new.id = gen_id(gen_new_table_id,1);  --or  use next value for
> end
>
> TABLE 2:
> CREATE SEQUENCE GEN_NEW_TABLE2_ID;
>
> CREATE TABLE NEW_TABLE2 (
>       ID      INTEGER NOT NULL,
>       FIELD1  INTEGER,
>       FIELD2  INTEGER,
>       FIELD3  INTEGER,
>       FIELD4  INTEGER,
>       FILED5  INTEGER
> );
>
> ALTER TABLE NEW_TABLE2 ADD CONSTRAINT PK_NEW_TABLE2 PRIMARY KEY (ID);
>
> CREATE OR ALTER TRIGGER NEW_TABLE2_BI FOR NEW_TABLE2
> ACTIVE BEFORE INSERT POSITION 0
> as
> begin
>       new.id = gen_id(gen_new_table2_id,1);  --or  use next value for
> end
>
> Stupid bulk insert 100000 records in each table (same records in each
> table):
>
> execute block
> as
> declare variable i int =0;
> declare variable x int =0;
> begin
>     while (x < 100) do
>     begin
>        i =0;
>         while (i < 1000) do
>         begin
>           insert into NEW_TABLE values (:i, :i, :i, :i, :i, :i);
>           insert into NEW_TABLE2 values (:i, :i, :i, :i, :i, :i);
>           i = i + 1;
>         end
>         x = x + 1;
>     end
> end
>
> Ok, now the query:
>
> with
> T2CTE as (select first 10 * from new_table2 where field1=1),
> T1CTE as (select first 100 * from new_table where field1=1)
>
> select t1.id, t1.FIELD1, t1.FIELD2, t1.FIELD3, t1.FIELD4
> from T1CTE t1
> full outer join T2CTE t2 on (t2.field1 = t1.field1 and t2.field2 =
> t1.field2 and t2.field3 = t1.field3 and t2.field4 = t1.field4 )
> full outer join T1CTE t1x on (t1x.field1 = t1.field1 and t1x.field2 =
> t1.field2 and t1x.field3 = t1.field3 and t1x.field4 = t1.field4)
> group by t1.id, t1.FIELD1, t1.FIELD2, t1.FIELD3, t1.FIELD4
>
> Yes, it's stupid but shows me the problem (IBEXPERT output):
>
> Plan
> PLAN SORT (JOIN (T1X NEW_TABLE NATURAL, JOIN (T2 NEW_TABLE2 NATURAL, T1
> NEW_TABLE NATURAL)))
>
> Prepare time = 16ms
> Execute time = 40s 750ms
> Avg fetch time = 1.455,36 ms
> Current memory = 137.242.916
> Max memory = 157.081.992
> Memory buffers = 8.192
> Reads from disk to cache = 0
> Writes from cache to disk = 0
> Fetches from cache = 222.991.689
>
> Again, viewing performance reads summary, it says:
>
> Non indexed reads:
> NEW_TABLE: 110092224
> NEW_TABLE2: 929402
>
> 110092224? 929402? WHY? CTE EXPLICITY SAYS 100 and 10!
>
> I've repeat the test with postgresql (same machine, postgres 9.5, same
> standard installation and same table data) and average execution time is
> about 270ms. It shows me the effective cte scan:
>
> HashAggregate  (cost=2088.17..2089.16 rows=99 width=20)
>     Group Key: t1.id, t1.field1, t1.field2, t1.field3, t1.field4
>     CTE t2cte
>       ->  Limit  (cost=0.00..190.61 rows=10 width=24)
>             ->  Seq Scan on new_table2  (cost=0.00..1887.00 rows=99 width=24)
>                   Filter: (field1 = 1)
>     CTE t1cte
>       ->  Limit  (cost=0.00..1887.00 rows=99 width=24)
>             ->  Seq Scan on new_table  (cost=0.00..1887.00 rows=99 width=24)
>                   Filter: (field1 = 1)
>     ->  Hash Full Join  (cost=4.36..9.33 rows=99 width=20)
>           Hash Cond: ((t1.field1 = t1x.field1) AND (t1.field2 =
> t1x.field2) AND (t1.field3 = t1x.field3) AND (t1.field4 = t1x.field4))
>           ->  Hash Full Join  (cost=0.40..3.87 rows=99 width=20)
>                 Hash Cond: ((t1.field1 = t2.field1) AND (t1.field2 =
> t2.field2) AND (t1.field3 = t2.field3) AND (t1.field4 = t2.field4))
>                 ->  CTE Scan on t1cte t1  (cost=0.00..1.98 rows=99 width=20)
>                 ->  Hash  (cost=0.20..0.20 rows=10 width=16)
>                       ->  CTE Scan on t2cte t2  (cost=0.00..0.20 rows=10
> width=16)
>           ->  Hash  (cost=1.98..1.98 rows=99 width=16)
>                 ->  CTE Scan on t1cte t1x  (cost=0.00..1.98 rows=99 width=16)
>
> Seems (i'm not a master of known universe in firebird) that the firebird
> performs a full table scan and joins mutch more data instead of the CTE
> results.
>
> Any ideas about why fb take too long time to execute the query (and
> scans so mutch rows)? I've omitted some CTE sintax to force fb to use it?
>
> The question is not about the query optimization (full outer join can be
> omitted, i know), but about why fb performs many reads insetad of 100 +
> 10 cte reads.
>
> Please correct me because i'm sure that i've done something erroneusly.
>
> Best regards,
>
> Agharta
Hi Agharta, thanks for a perfect description of your problem!

Does the performance change if you add another CTE and change the query 
like below?

with
T2CTE as (select first 10 * from new_table2 where field1=1),
T1CTE as (select first 100 * from new_table where field1=1),
T3CTE as (select * from T1CTE union select * from T2CTE)

select t1.id, t1.FIELD1, t1.FIELD2, t1.FIELD3, t1.FIELD4
from T3CTE t3
left join T1CTE t1
        on t3.field1 = t1.field1 and t3.field2 = t1.field2 and t3.field3 = 
t1.field3 and t3.field4 = t1.field4
left join T2CTE t2
        on t3.field1 = t2.field1 and t3.field2 = t2.field2 and t3.field3 = 
t2.field3 and t3.field4 = t2.field4
left join T1CTE t1x
        on t3.field1 = t1x.field1 and t3.field2 = t1x.field2 and t3.field3 = 
t1x.field3 and t3.field4 = t1x.field4
group by t1.id, t1.FIELD1, t1.FIELD2, t1.FIELD3, t1.FIELD4

I think the result should be identical (though I am slightly uncertain 
if there are say three identical rows in both T1CTE and T2CTE), and it 
wouldn't surprise me if it was quicker (I think I once did something 
similar myself when a query took too long, though I don't remember which 
Firebird version I then used). If it works, then I would call it a 
workaround, and agree that Firebird is not great for FULL OUTER JOIN on 
larger tables.

HTH,
Set

Reply via email to