It sounds as if Calcite is a very good fit. Data virtualization (which is how 
I’d describe your use case) was one of the main goals I had in mind when 
creating Calcite.

On the question of whether Calcite is a framework or engine, let’s look at how 
some systems use Calcite:

* Hive uses Calcite’s query planning framework, so it doesn’t use its SQL 
parser. It has its own engine.

* Drill goes a bit further, and uses both the parser/validator and planning 
framework. It has is own engine.

* Kylin goes further still, and uses the parser/validator, planning framework, 
and also Calcite’s engine for anything its own engine cannot do.

* Phoenix does much the same as Kylin, but also uses Avatica for JDBC.

The “engine” is the "enumerable convention”, the ability to generate code for 
each relational operator that is, basically, a Java iterator. Plus 
implementations of the SQL built-in operators. It doesn’t scale beyond a single 
JVM, but is nevertheless useful for catching what the underlying engine cannot 
do.

DML has not been a focus (most of our applications are analytics, and therefore 
some other system is writing the data) but it fits into the architecture just 
fine, and in fact Phoenix are doing a lot of DML work. We have basic support 
for INSERT, which we could strengthen, and we could add support for UPDATE, 
MERGE and DELETE.

Josh, Can you weigh in on options for scaling out Avatica?

Julian



> On Mar 17, 2016, at 2:32 PM, Maxime Jattiot <[email protected]> wrote:
> 
> Hello everyone,
> 
> Your project seems very interesting but I am bit lost in the ocean of 
> possibilities. Let me explain :
> 
> We are currently developing an application with a microservices architecture. 
> Those services are written in Python but few are in Java or Go.
> Of course a lot of them need to persist data and we are using a CQRS 
> approach. The idea is that our micro services will use two kind of databases 
> : one for ACID/OLTP and another one for OLAP.
> 
> Our main issue is that we want to be agnostic of the underlying databases 
> because we will install our application into different clients environments. 
> These clients could have clusters of Mongo, Cassandra or Couchbase for OLTP 
> or Spark for OLAP.
> 
> So our first idea was to stick to the SQL query language with ODBC drivers to 
> stay agnostic and find a way to translate our SQL queries into whatever query 
> languages such databases accept while keeping good performances.
> 
> Your framework seems to provide both a SQL translator + an optimizer. However 
> it raised few questions :
> - It seems you don’t have support for insert, update and delete but when 
> digging I see that Apache Hive is using your framework and they now supports 
> insert,update,delete. How comes ? Are they using only part of your framework ?
> - Also you said you are a framework but can you work as an engine that is 
> queried through ODBC/JDBC ? Is that the purpose of Avatica ? If yes can we 
> cluster it to handle many clients queries ?
> 
> As you might feel I am a bit lost and I am wondering what we should do for 
> our use case. So any insights are welcome, such as what would you do if you 
> were us ?
> 
> Thank you very much,
> 
> Kind regards,
> 
> Maxime JATTIOT
> 
> 
> 
> 

Reply via email to