Re: calcite overhead for simple queries (optimization and conversion phase)

Andrei Sereda Tue, 30 Apr 2019 11:43:40 -0700

> Consider using Hep planner rather than Volcano planner.
Will check with Hep. Working on isolated unit test.


> If you reduce the number of columns (to say 10), does the time reduce
significantly? That might be a clue that there is a performance bug
somewhere.
Number of columns seems to be correlated with performance penalty:
- for 10 columns:  120 (raw) vs 150ms (calcite)
- for 50 columns:  260 (raw) vs 740ms (calcite)

> Are these numbers on the first query, or after the system has warmed up?
After. Following warmup I'm running 100 queries sequentially.

As temporary work-around can I re-use PreparedStatement ?

On Tue, Apr 30, 2019 at 2:06 PM Julian Hyde <[email protected]> wrote:

> Consider using Hep planner rather than Volcano planner. (Measure the
> number of rule firings. Is it higher than you think is necessary, given the
> complexity of the query?)
>
> If you reduce the number of columns (to say 10), does the time reduce
> significantly? That might be a clue that there is a performance bug
> somewhere.
>
> Are these numbers on the first query, or after the system has warmed up?
>
> Julian
>
>
> > On Apr 30, 2019, at 9:41 AM, Andrei Sereda <[email protected]> wrote:
> >
> > Hello,
> >
> > One of our applications uses Calcite as translation layer between SQL and
> > destination source (mongo, elastic, etc.). The queries are fairly simple
> > and similar to the one below:
> >
> > select col1, col2, agg3(col3), agg4(col4), ..., aggN(colN) from table
> > where id in (1, 2, 3) group by col1, col2
> >
> > The only complexity is that number of columns can be fairly large (up to
> > 150) but otherwise it is a standard aggregation with some simple
> predicates
> > (no joins). Number of rows is small and usually is less than 1k.
> >
> > We have observed that overhead for such queries is 2x-3x (95th
> percentile)
> > compared to executing produced queries directly on the data-source (eg.
> > mongo / elastic query). Difference is in the order of 100ms: 200ms
> (direct)
> > vs 600ms (calcite). Unfortunately such latency is noticeable in UI.
> >
> > Originally I thought it has to do with compilation time (janino) but
> > profiling showed that most of time is spent in the following methods:
> >
> >   1. .preprare.Prepare.optimize() (VolcanoPanner)
> >   2. .sql2rel.SqlToRelConverter.convertQuery()
> >
> > What can be done to avoid such overhead ?
> >
> >   1. Have avatica / calcite connection cache
> connection.prepareStatement()
> >   so same optimization is not done twice ? Manually re-running same
> >   PreparedStatement helps.
> >   2. Use interpreter ?
> >   3. Manually re-use PreparedQuery (eg. from Cache<String,
> >   PreparedStatement>) ? This introduces other complexities like executing
> >   same query in parallel.
> >   4. Minimize number of Rules ?
> >   5. Cache logical plan (RelNode) ?
> >   6. Anything else ?
> >
> > Many thanks in advance.
> >
> > Andrei.
>
>

Re: calcite overhead for simple queries (optimization and conversion phase)

Reply via email to