Sounds like a very interesting issue.

While I’m evaluating Calcite for JDBC adaptor over postgreSQL with TPC-DS 
queries, where Calcite queries 2~10 times slower than native postgresql queries 
through psql.  So, including JDBC latency issues, overall enhancement of 
Avatica would be beneficial to Calcite. Perhaps, query processing itself can be 
an issue for this case, according to the following comments on JDBC adaptor 
from Calcite’s tutorial page (https://calcite.apache.org/docs/tutorial.html):

Current limitations: The JDBC adapter currently only pushes down table scan 
operations; all other processing (filtering, joins, aggregations and so forth) 
occurs within Calcite. Our goal is to push down as much processing as possible 
to the source system, translating syntax, data types and built-in functions as 
we go. If a Calcite query is based on tables from a single JDBC database, in 
principle the whole query should go to that database. If tables are from 
multiple JDBC sources, or a mixture of JDBC and non-JDBC, Calcite will use the 
most efficient distributed query approach that it can.

Thank you,
Seung-Hwan


On Aug 23, 2018, at 3:45 PM, Julian Hyde 
<jh...@apache.org<mailto:jh...@apache.org>> wrote:

This is a paper in VLDB 2018, "Don’t Hold My Data Hostage – A Case For Client 
Protocol Redesign” by Mark Rassveldt and Hannes Muhleisen[1]. It claims that 
database client protocols (inside ODBC and JDBC drivers) are very inefficient, 
and has a compelling example where commercial drivers are 10x to 68x slower 
than net-cat.

One of the goals of Avatica is to do better. How are we doing? Are there any 
ideas in the paper we could adopt? Would a closer partnership with Apache Arrow 
help us achieve those goals?

Julian

[1] https://hannes.muehleisen.org/p852-muehleisen.pdf 
<https://hannes.muehleisen.org/p852-muehleisen.pdf>

Reply via email to