[firebird-support] Re: performance of subselect with group by

Svein Erling Tysvær [email protected] [firebird-support] Thu, 18 Dec 2014 14:35:22 -0800

>Hello,
>
>  thanks Set, makes my test unnecessary!
>
>  But I'm thinking a little bit about my original problem.
>
>  Would it make sense to add a tracker entry for optimization of
>  subselects without reference to outer query?
>
>  I think that they should get evaluated and transformed to something
>  like a "or" connected list of simple compares.
>
>  So
>  delete from test where Id in (
> select min(t.Id) FROM test t
> group by t.reference, t.key
> having count(*) > 1
> )
>
>  could be transformed by the engine to
>
>  delete from test where Id = 5 or Id = 6 or Id = 89
>
>  in case of
>
>  select min(t.Id) FROM test t
>  group by t.reference, t.key
>  having count(*) > 1
>
>  would return 5,6,89
>
>  I've found several similar questions in my mail archive so I think
>  I'm not the only one with such a problem.
>
>  A common use case would be to remove duplicate entries for
>  example to create a unique index.
>
>  Any comments?


My guess is that it is too early to add this to Firebird. Sure, there are 
situations where a non-correlated subselect would be benefitial, but how would 
Firebird know when? In your case (since you're deleting from the same table as 
you're selecting from), I agree that this could be simple. However, normally 
the select would be against a different table and then I think things are not 
as simple. Assuming we were talking about two different tables: I don't know 
how Firebird translates a query like yours into EXISTS, but theoretically it 
could be something like:

delete from DeleteFromTable t
where exists(select * from test t2
             left join test t3 
                    on t2.reference = t3.reference
                   and t2.key = t3.key
                   and t2.id > t3.id
             where t.id = t2.id
               and t3.id is null)

If DeleteFromTable was a smallish table, and test was large and sensible 
indexes were in place, then my rewritten query would be quicker if your 'min 
subquery' returned a lot of rows. On the other hand, if your subquery returned 
few rows, then executing the subselect like you suggest would indeed be 
quicker. If you add a WHERE clause to your subselect, it is even thinkable 
that, say, WHERE MyField = 1 would be quicker using your solution, whereas 
WHERE MyField = 2 would be quicker using mine. And the only way for Firebird to 
know, would be to have histograms. Histograms was removed from things to be 
implemented in version 3 (I think it was announced during the conference, at 
least I'm certain that I've heard this), I hope they will appear in Firebird 4. 
I would think that once histograms are in place, cases like yours are amongst 
those that can be considered for optimization (although I don't know anything 
about the source of Firebird).

You are definitely not the only one with this kind of problem, I would 
appreciate it myself if it was possible for deletes and updates to use indexes 
similar to what is possible with selects (e.g. allowing for JOIN). However, my 
knowledge of SQL and Firebird is not good enough to formulate how to do this in 
a sensible way. Also, I think that as users of Firebird it is possible to 
rewrite our slow queries to more performant ones in most cases, so to me this 
problem is not all too big.

I have never needed expression indexes myself, but have you looked into writing 
one to see if this can make your query execute (considerably) faster?

Sorry for not agreeing with you in that this is a useful optimization just yet, 
if you're lucky people knowing something about the source code of Firebird will 
disagree with me.

Set

[firebird-support] Re: performance of subselect with group by

Reply via email to