Hello, One of our applications uses Calcite as translation layer between SQL and destination source (mongo, elastic, etc.). The queries are fairly simple and similar to the one below:
select col1, col2, agg3(col3), agg4(col4), ..., aggN(colN) from table where id in (1, 2, 3) group by col1, col2 The only complexity is that number of columns can be fairly large (up to 150) but otherwise it is a standard aggregation with some simple predicates (no joins). Number of rows is small and usually is less than 1k. We have observed that overhead for such queries is 2x-3x (95th percentile) compared to executing produced queries directly on the data-source (eg. mongo / elastic query). Difference is in the order of 100ms: 200ms (direct) vs 600ms (calcite). Unfortunately such latency is noticeable in UI. Originally I thought it has to do with compilation time (janino) but profiling showed that most of time is spent in the following methods: 1. .preprare.Prepare.optimize() (VolcanoPanner) 2. .sql2rel.SqlToRelConverter.convertQuery() What can be done to avoid such overhead ? 1. Have avatica / calcite connection cache connection.prepareStatement() so same optimization is not done twice ? Manually re-running same PreparedStatement helps. 2. Use interpreter ? 3. Manually re-use PreparedQuery (eg. from Cache<String, PreparedStatement>) ? This introduces other complexities like executing same query in parallel. 4. Minimize number of Rules ? 5. Cache logical plan (RelNode) ? 6. Anything else ? Many thanks in advance. Andrei.
