Re: [firebird-support] How does the optimizer of Firebird determine join order?

Ann Harrison aharri...@ibphoenix.com [firebird-support] Wed, 14 May 2014 18:35:30 -0700

On Tue, May 13, 2014 at 10:37 PM, Zhan Li zhanl...@gmail.com[firebird-support]
<firebird-support@yahoogroups.com> wrote:


>
>
> Hi All,
>
> I've been studying the optimizer of firebird for a while, and I was
> wondering how the optimizer chooses join order? As most query optimizers
> determine join order via a dynamic programming algorithm pioneered by IBM's
> System R database project, does the optimizer of firebird use similar
> strategy?
>

I can't speak to the details of the current optimizer, but the original
design of the optimizer was developed concurrently with IBM's System R, so
neither derived from the other.  Firebird has a cost-based optimizer which
seeks to minimize the number rows returned as the intermediate product.
 The optimizer focuses on inner joins.  Outer joins have an implicit order
- the side that is preserved (e.g. the first named term in a left outer
join) must precede the optional side.  Yes, if you have something like:

from customers c,
           left outer join addresses a on (a.cust_id = c.cust_id)
           left outer join invoices i on (i.cust_id = c.cust_id)

you would necessarily look up customers first and could chose to look at
addresses or invoices in either order.  But that has very little to do with
the overall cost.

These examples are simple - the actual cases are joins of many tables.

Before it begins optimization, Firebird distributes equalities.

from orders o
      inner join customers c on (o.cust_id = c.cust_id)
where c.cust_id = 12345

is transformed into:

from orders o
      inner join customers c on (o.cust_id = c.cust_id)
where c.cust_id = 12345 and o.cust_id = 12435

That allows the optimizer to choose between starting with customers (a
single valued lookup) and starting with orders (potentially multi-valued,
but possibly desirable, depending on what else was included in the query.


The first design of the optimizer considered all possible orders for inner
joins, estimating the cost based on the selectivity of the index and the
nature of the conjunct.  An equality comparison where one side is
identified by a comparison between a constant and a unique key and the
other on an equality between that unique key and a unique key in the second
table requires looking at two records (and a few index entries).  For
example:

from orders o
      inner join customers c on (o.cust_id = c.cust_id)
where o.order_id = 12345

Each order has only one value for cust_id, so the optimizer chooses to look
up the order by its unique key, then use the cust_id in the order to look
up the customer by its unique key.

from orders o
      inner join customers c on (o.cust_id = c.cust_id)
where c.cust_state = 'RI'

Here the optimizer considers the selectivity for the index on "state =
'RI'" and compares it with the cardinalities of orders and customers. If
the company doesn't do much business, but all of it is in Rhode Island, the
cost may be lower if it reads all the orders, does a single row look-up on
customers, and throws out those that aren't in Rhode Island.

Assuming that the index selectivity was reasonably correct, and the index
values were distributed evenly (i.e. no cases where 90% of the entries had
one value and the other 10% were random), eventually that algorithm got a
good join order.  At the time, Oracle used semantic ordering:  tables were
evaluated in the order they appeared in the query.  However, the full cross
product of joins of more than
 9 or so, the full cross product lead to problems, "the optimization took
twelve hours.  Running the query too .05 milliseconds."  Good answer, bad
process.

Interbase (as it was then) adopted a strategy of weighting results.  The
first ordering it considered was analyzed in full and the full cost of the
intermediate product was saved.  The subsequent orderings compared
themselves with the first as each table was introduced and if at any step
the cost was greater than the first, the ordering was discarded.

I think there was a subsequent reorganization that considers the longest
possible chain of equalities first.

A goal, at least for Interbase and early versions of Firebird, was to
emphasize getting good results for good queries and not chasing silly
queries - like returning no results from the compiler if one of the
conditions was "and 1 = 2".

Good luck,

Ann

> f
>

Re: [firebird-support] How does the optimizer of Firebird determine join order?

Reply via email to