Re: Underflow issue with big IN clauses

Julian Hyde Mon, 11 Jan 2016 11:09:13 -0800

There are a couple of reasons why we convert to an inline table. An inline 
table is can often be represented as an array or a list (it depends on the 
engine we are generating for) and can then be loaded into the "build" side of a 
hash table for a hash join. This is beneficial if the number of values in the 
IN clause is large (in the thousands, say) but the other side is still larger. 
The table is populated once, and membership in the IN clause is a probe into 
the hash table (which basically is just computing a hash function and looking 
into an array). Much more efficient than evaluating thousands of branches in a 
large "OR" expression


The other reason is that as expression size increases, many of the operations 
that occur during query planning become more expensive. We probably have 
operations that grow with O(n log n) or O(n ^ 2) with the expression size. 
Moving these expressions into an inline table keeps the query planning process 
tractable.

Julian

> On Jan 11, 2016, at 7:54 AM, Frankie Bollaert <[email protected]> wrote:
> 
> Hi all,
> 
> Having worked on a bug related to IN clauses, I believe I found an issue
> which has not been reported yet.  When executing a query with more then 19
> id's in the IN clause, an underflow exception is thrown.
> 
> I created a Jira ticket for this:
> https://issues.apache.org/jira/browse/CALCITE-1051
> 
> 
> But besides this, I was wondering about the reason why a large IN clause is
> converted into a join to an inline table.  (This happens in the
> SqlToRelConverter).
> 
> More specifically how does this optimise the query execution?  How would a
> storage engine be able to handle a join to an inline table more
> efficiently?  Is this not a storage engine specific optimisation?
> 
> 
> Thank you !
> Frankie

Re: Underflow issue with big IN clauses

Reply via email to