Re: calcite overhead for simple queries (optimization and conversion phase)

Julian Hyde Tue, 30 Apr 2019 13:07:29 -0700

> As temporary work-around can I re-use PreparedStatement ?

If you mean re-execute PreparedStatement, yes. Its main purpose is to only pay 
the cost of preparation once, but execute multiple times (possibly with 
different parameter values).


But each time you execute it will be the same query and the same plan.

Julian


> On Apr 30, 2019, at 11:43 AM, Andrei Sereda <[email protected]> wrote:
> 
>> Consider using Hep planner rather than Volcano planner.
> Will check with Hep. Working on isolated unit test.
> 
>> If you reduce the number of columns (to say 10), does the time reduce
> significantly? That might be a clue that there is a performance bug
> somewhere.
> Number of columns seems to be correlated with performance penalty:
> - for 10 columns:  120 (raw) vs 150ms (calcite)
> - for 50 columns:  260 (raw) vs 740ms (calcite)
> 
>> Are these numbers on the first query, or after the system has warmed up?
> After. Following warmup I'm running 100 queries sequentially.
> 
> As temporary work-around can I re-use PreparedStatement ?
> 
> On Tue, Apr 30, 2019 at 2:06 PM Julian Hyde <[email protected]> wrote:
> 
>> Consider using Hep planner rather than Volcano planner. (Measure the
>> number of rule firings. Is it higher than you think is necessary, given the
>> complexity of the query?)
>> 
>> If you reduce the number of columns (to say 10), does the time reduce
>> significantly? That might be a clue that there is a performance bug
>> somewhere.
>> 
>> Are these numbers on the first query, or after the system has warmed up?
>> 
>> Julian
>> 
>> 
>>> On Apr 30, 2019, at 9:41 AM, Andrei Sereda <[email protected]> wrote:
>>> 
>>> Hello,
>>> 
>>> One of our applications uses Calcite as translation layer between SQL and
>>> destination source (mongo, elastic, etc.). The queries are fairly simple
>>> and similar to the one below:
>>> 
>>> select col1, col2, agg3(col3), agg4(col4), ..., aggN(colN) from table
>>> where id in (1, 2, 3) group by col1, col2
>>> 
>>> The only complexity is that number of columns can be fairly large (up to
>>> 150) but otherwise it is a standard aggregation with some simple
>> predicates
>>> (no joins). Number of rows is small and usually is less than 1k.
>>> 
>>> We have observed that overhead for such queries is 2x-3x (95th
>> percentile)
>>> compared to executing produced queries directly on the data-source (eg.
>>> mongo / elastic query). Difference is in the order of 100ms: 200ms
>> (direct)
>>> vs 600ms (calcite). Unfortunately such latency is noticeable in UI.
>>> 
>>> Originally I thought it has to do with compilation time (janino) but
>>> profiling showed that most of time is spent in the following methods:
>>> 
>>>  1. .preprare.Prepare.optimize() (VolcanoPanner)
>>>  2. .sql2rel.SqlToRelConverter.convertQuery()
>>> 
>>> What can be done to avoid such overhead ?
>>> 
>>>  1. Have avatica / calcite connection cache
>> connection.prepareStatement()
>>>  so same optimization is not done twice ? Manually re-running same
>>>  PreparedStatement helps.
>>>  2. Use interpreter ?
>>>  3. Manually re-use PreparedQuery (eg. from Cache<String,
>>>  PreparedStatement>) ? This introduces other complexities like executing
>>>  same query in parallel.
>>>  4. Minimize number of Rules ?
>>>  5. Cache logical plan (RelNode) ?
>>>  6. Anything else ?
>>> 
>>> Many thanks in advance.
>>> 
>>> Andrei.
>> 
>>

Re: calcite overhead for simple queries (optimization and conversion phase)

Reply via email to