Hi, Vladimir, Thank you for your response. That’s very helpful. We are actually using calcite as the first case you described. We use it as a parse and federation engine. Our data are spread over difference data engines with different formats, e.g. csv files, mysql, druid and elastic search etc. We will definitely push the query down to the underlying systems as much as possible, we are actually using the calcite adapters from the calcite project and are creating our owns for these data sources that were not supported yet.
So what I was looking for to understand the performance impact is: 1. For all the queries that can completely push down, what’s the overhead added by going through calcite? which including the parse time/optimize time 2. For queries that have to fall back to do in memory join using calcite built-in enumerable convention, what is the performance look like given enough memory for a set of given inputs. 3. For large joins that can not be done in one host, what’s the common practice? using spark adaptor? 4. Do we have any self-protection to reject any query that requires too much local resources? Thanks, -JD > On Jun 8, 2017, at 2:21 PM, Vladimir Sitnikov <[email protected]> > wrote: > > Hello, > >> Have anyone done any benchmark to evaluate calcite’s performance impact? > Or is there any documentation regarding performance concern? > > Well, the performance depends on your use case. > As far as I understand, here are the typical features: > 1) Given a query, Calcite would try to push all the tables/predicates to > the downstream executor (i.e. DB) > 2) In case there are joins between different data stores, Calcite would > still push as much filters as it is possible, yet perform the join in memory > 3) Calcite has no idea which indices are available at the storage level, > thus don't expect it to generate plans like "for each row from the > datastore1 go and fetch a relevant row from datastore2". In 100% of the > cases it would be "hashjoin(full fetch datastore1, full fetch datastore2)" > > > In case you are going to use Calcite as a proxy (that is Calcite would > parse and just send the whole query downstream), then you might be > interested in JMH-based benchmarks. > Here they are: > https://github.com/apache/calcite/blob/master/ubenchmark/src/main/java/org/apache/calcite/benchmarks/StatementTest.java > Feel free to add more benchmarks there. > > > PS. Index support is doable (one can fetch the sets of indexes from the > downstream datastores), however it is not done yet. > > Vladimir
