calcite overhead for simple queries (optimization and conversion phase)

Andrei Sereda Tue, 30 Apr 2019 09:41:38 -0700

Hello,

One of our applications uses Calcite as translation layer between SQL and
destination source (mongo, elastic, etc.). The queries are fairly simple
and similar to the one below:


select col1, col2, agg3(col3), agg4(col4), ..., aggN(colN) from table
where id in (1, 2, 3) group by col1, col2

The only complexity is that number of columns can be fairly large (up to
150) but otherwise it is a standard aggregation with some simple predicates
(no joins). Number of rows is small and usually is less than 1k.

We have observed that overhead for such queries is 2x-3x (95th percentile)
compared to executing produced queries directly on the data-source (eg.
mongo / elastic query). Difference is in the order of 100ms: 200ms (direct)
vs 600ms (calcite). Unfortunately such latency is noticeable in UI.

Originally I thought it has to do with compilation time (janino) but
profiling showed that most of time is spent in the following methods:

   1. .preprare.Prepare.optimize() (VolcanoPanner)
   2. .sql2rel.SqlToRelConverter.convertQuery()

What can be done to avoid such overhead ?

   1. Have avatica / calcite connection cache connection.prepareStatement()
   so same optimization is not done twice ? Manually re-running same
   PreparedStatement helps.
   2. Use interpreter ?
   3. Manually re-use PreparedQuery (eg. from Cache<String,
   PreparedStatement>) ? This introduces other complexities like executing
   same query in parallel.
   4. Minimize number of Rules ?
   5. Cache logical plan (RelNode) ?
   6. Anything else ?

Many thanks in advance.

Andrei.

calcite overhead for simple queries (optimization and conversion phase)

Reply via email to