Re: Quicksql

Julian Hyde Tue, 10 Dec 2019 13:34:37 -0800

The goal (or rather my goal) for the interpreter is to replace
Enumerable as the quick, easy default convention.


Enumerable is efficient but not that efficient (compared to engines
that work on off-heap data representing batches of records). And
because it generates java byte code there is a certain latency to
getting a query prepared and ready to run.

It basically implements the old Volcano query evaluation model. It is
single-threaded (because all work happens as a result of a call to
'next()' on the root node) and cannot handle branching data-flow
graphs (DAGs).

The Interpreter operates uses a co-routine model (reading from queues,
writing to queues, and yielding when there is no work to be done) and
therefore could be more efficient than enumerable in a single-node
multi-core system. Also, there is little start-up time, which is
important for small queries.

I would love to add another built-in convention that uses Arrow as
data format and generates co-routines for each operator. Those
co-routines could be deployed in a parallel and/or distributed data
engine.

Julian

On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
<zolyfar...@yahoo.com.invalid> wrote:
>
> What is the ultimate goal of the Calcite Interpreter?
>
> To provide some context, I have been playing around with calcite + REST (see 
> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest 
> <https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for 
> detail of my experiments)
>
>
> —Z
>
> > On Dec 9, 2019, at 9:05 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> > Yes, virtualization is one of Calcite’s goals. In fact, when I created 
> > Calcite I was thinking about virtualization + in-memory materialized views. 
> > Not only the Spark convention but any of the “engine” conventions (Drill, 
> > Flink, Beam, Enumerable) could be used to create a virtual query engine.
> >
> > See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite) 
> > https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> >  
> > <https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework>.
> >
> > Julian
> >
> >
> >
> >> On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana <mgelb...@apache.org> wrote:
> >>
> >> I recently contacted one of the active contributors asking about the
> >> purpose of the project and here's his reply:
> >>
> >> From my understanding, Quicksql is a data virtualization platform. It can
> >>> query multiple data sources altogether and in a distributed way; Say, you
> >>> can write a SQL with a MySql table join with an Elasticsearch table.
> >>> Quicksql can recognize that, and then generate Spark code, in which it 
> >>> will
> >>> fetch the MySQL/ES data as a temporary table separately, and then join 
> >>> them
> >>> in Spark. The execution is in Spark so it is totally distributed. The user
> >>> doesn't need to aware of where the table is from.
> >>>
> >>
> >> I understand that the Spark convention Calcite has attempts to achieve the
> >> same goal, but it isn't fully implemented yet.
> >>
> >>
> >> On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde <jh...@apache.org> wrote:
> >>
> >>> Anyone know anything about Quicksql? It seems to be quite a popular
> >>> project, and they have an internal fork of Calcite.
> >>>
> >>> https://github.com/Qihoo360/ <https://github.com/Qihoo360/>
> >>>
> >>>
> >>> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> >>> <
> >>> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> >>>>
> >>>
> >>> Julian
> >>>
> >>>
> >
>

Re: Quicksql

Reply via email to