Re: Quicksql

Alessandro Solimando Thu, 12 Dec 2019 02:00:38 -0800

Adapters must be needed by data sources not supporting SQL, I think this is
what Juan Pan was asking for.


On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan <[email protected]> wrote:

> Nope, it doesn't use any adapters. It just submits partial SQL query to
> different engines.
>
> If query contains table from single source, e.g.
> select count(*) from hive_table1, hive_table2 where a=b;
> then the whole query will be submitted to hive.
>
> Otherwise, e.g.
> select distinct a,b from hive_table union select distinct a,b from
> mysql_table;
>
> The following query will be submitted to Spark and executed by Spark:
> select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;
>
> spark_tmp_table1: select distinct a,b from hive_table
> spark_tmp_table2: select distinct a,b from mysql_table
>
> On 2019/12/11 04:27:07, "Juan Pan" <[email protected]> wrote:
> > Hi Haisheng,
> >
> >
> > > The query on different data source will then be registered as temp
> spark tables (with filter or join pushed in), the whole query is rewritten
> as SQL text over these temp tables and submitted to Spark.
> >
> >
> > Does it mean QuickSQL also need adaptors to make query executed on
> different data source?
> >
> >
> > > Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
> >
> >
> > Basically, i like and agree with Julian’s statement. It is a great idea
> which personally hope Calcite move towards.
> >
> >
> > Give my best wishes to Calcite community.
> >
> >
> > Thanks,
> > Trista
> >
> >
> >  Juan Pan
> >
> >
> > [email protected]
> > Juan Pan(Trista), Apache ShardingSphere
> >
> >
> > On 12/11/2019 10:53，Haisheng Yuan<[email protected]> wrote：
> > As far as I know, users still need to register tables from other data
> sources before querying it. QuickSQL uses Calcite for parsing queries and
> optimizing logical expressions with several transformation rules. The query
> on different data source will then be registered as temp spark tables (with
> filter or join pushed in), the whole query is rewritten as SQL text over
> these temp tables and submitted to Spark.
> >
> > - Haisheng
> >
> > ------------------------------------------------------------------
> > 发件人：Rui Wang<[email protected]>
> > 日 期：2019年12月11日 06:24:45
> > 收件人：<[email protected]>
> > 主 题：Re: Quicksql
> >
> > The co-routine model sounds fitting into Streaming cases well.
> >
> > I was thinking how should Enumerable interface work with streaming cases
> > but now I should also check Interpreter.
> >
> >
> > -Rui
> >
> > On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde <[email protected]> wrote:
> >
> > The goal (or rather my goal) for the interpreter is to replace
> > Enumerable as the quick, easy default convention.
> >
> > Enumerable is efficient but not that efficient (compared to engines
> > that work on off-heap data representing batches of records). And
> > because it generates java byte code there is a certain latency to
> > getting a query prepared and ready to run.
> >
> > It basically implements the old Volcano query evaluation model. It is
> > single-threaded (because all work happens as a result of a call to
> > 'next()' on the root node) and cannot handle branching data-flow
> > graphs (DAGs).
> >
> > The Interpreter operates uses a co-routine model (reading from queues,
> > writing to queues, and yielding when there is no work to be done) and
> > therefore could be more efficient than enumerable in a single-node
> > multi-core system. Also, there is little start-up time, which is
> > important for small queries.
> >
> > I would love to add another built-in convention that uses Arrow as
> > data format and generates co-routines for each operator. Those
> > co-routines could be deployed in a parallel and/or distributed data
> > engine.
> >
> > Julian
> >
> > On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
> > <[email protected]> wrote:
> >
> > What is the ultimate goal of the Calcite Interpreter?
> >
> > To provide some context, I have been playing around with calcite + REST
> > (see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest
> <
> > https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
> > detail of my experiments)
> >
> >
> > —Z
> >
> > On Dec 9, 2019, at 9:05 PM, Julian Hyde <[email protected]> wrote:
> >
> > Yes, virtualization is one of Calcite’s goals. In fact, when I created
> > Calcite I was thinking about virtualization + in-memory materialized
> views.
> > Not only the Spark convention but any of the “engine” conventions (Drill,
> > Flink, Beam, Enumerable) could be used to create a virtual query engine.
> >
> > See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
> >
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> > <
> >
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> > .
> >
> > Julian
> >
> >
> >
> > On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana <[email protected]>
> > wrote:
> >
> > I recently contacted one of the active contributors asking about the
> > purpose of the project and here's his reply:
> >
> > From my understanding, Quicksql is a data virtualization platform. It
> > can
> > query multiple data sources altogether and in a distributed way;
> > Say, you
> > can write a SQL with a MySql table join with an Elasticsearch table.
> > Quicksql can recognize that, and then generate Spark code, in which
> > it will
> > fetch the MySQL/ES data as a temporary table separately, and then
> > join them
> > in Spark. The execution is in Spark so it is totally distributed.
> > The user
> > doesn't need to aware of where the table is from.
> >
> >
> > I understand that the Spark convention Calcite has attempts to
> > achieve the
> > same goal, but it isn't fully implemented yet.
> >
> >
> > On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde <[email protected]> wrote:
> >
> > Anyone know anything about Quicksql? It seems to be quite a popular
> > project, and they have an internal fork of Calcite.
> >
> > https://github.com/Qihoo360/ <https://github.com/Qihoo360/>
> >
> >
> >
> >
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> > <
> >
> >
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> >
> >
> > Julian
> >
> >
> >
> >
> >
> >
> >
>

Re: Quicksql

Reply via email to