> Consider using Hep planner rather than Volcano planner. Will check with Hep. Working on isolated unit test.
> If you reduce the number of columns (to say 10), does the time reduce significantly? That might be a clue that there is a performance bug somewhere. Number of columns seems to be correlated with performance penalty: - for 10 columns: 120 (raw) vs 150ms (calcite) - for 50 columns: 260 (raw) vs 740ms (calcite) > Are these numbers on the first query, or after the system has warmed up? After. Following warmup I'm running 100 queries sequentially. As temporary work-around can I re-use PreparedStatement ? On Tue, Apr 30, 2019 at 2:06 PM Julian Hyde <[email protected]> wrote: > Consider using Hep planner rather than Volcano planner. (Measure the > number of rule firings. Is it higher than you think is necessary, given the > complexity of the query?) > > If you reduce the number of columns (to say 10), does the time reduce > significantly? That might be a clue that there is a performance bug > somewhere. > > Are these numbers on the first query, or after the system has warmed up? > > Julian > > > > On Apr 30, 2019, at 9:41 AM, Andrei Sereda <[email protected]> wrote: > > > > Hello, > > > > One of our applications uses Calcite as translation layer between SQL and > > destination source (mongo, elastic, etc.). The queries are fairly simple > > and similar to the one below: > > > > select col1, col2, agg3(col3), agg4(col4), ..., aggN(colN) from table > > where id in (1, 2, 3) group by col1, col2 > > > > The only complexity is that number of columns can be fairly large (up to > > 150) but otherwise it is a standard aggregation with some simple > predicates > > (no joins). Number of rows is small and usually is less than 1k. > > > > We have observed that overhead for such queries is 2x-3x (95th > percentile) > > compared to executing produced queries directly on the data-source (eg. > > mongo / elastic query). Difference is in the order of 100ms: 200ms > (direct) > > vs 600ms (calcite). Unfortunately such latency is noticeable in UI. > > > > Originally I thought it has to do with compilation time (janino) but > > profiling showed that most of time is spent in the following methods: > > > > 1. .preprare.Prepare.optimize() (VolcanoPanner) > > 2. .sql2rel.SqlToRelConverter.convertQuery() > > > > What can be done to avoid such overhead ? > > > > 1. Have avatica / calcite connection cache > connection.prepareStatement() > > so same optimization is not done twice ? Manually re-running same > > PreparedStatement helps. > > 2. Use interpreter ? > > 3. Manually re-use PreparedQuery (eg. from Cache<String, > > PreparedStatement>) ? This introduces other complexities like executing > > same query in parallel. > > 4. Minimize number of Rules ? > > 5. Cache logical plan (RelNode) ? > > 6. Anything else ? > > > > Many thanks in advance. > > > > Andrei. > >
