Thank you for your introduction, Siyuan.

Quicksql is an interesting project and full of potential. I'd like to learn
more about it.


Best,
Chunwei


On Tue, Mar 3, 2020 at 3:26 AM Siyuan Liu <geekli...@gmail.com> wrote:

> Hi, everyone:
>
> Glad to see a lot of old friends here. Quicksql is a project born in early
> 2019. It was designed to solve the problem of long and complex work flow in
> the big data field with many data sources, many compute engines, and many
> types of syntax. The core idea is `Connect All Data Sources with One Extra
> Parsing Cost`.
>
> Because it involves standard SQL parsing, we finally chose Calcite as the
> parsing engine that has the best SQL compatibility. Thanks to the excellent
> architecture and toolkits provided by Calcite, Quicksql has made some
> extensions on this basis and made more logical plans Rich definitions
> enable single data source and multi-source queries to be described. For
> single data sources, an end-to-end connection query is directly
> established, and for multiple data sources, logical plans are divided and
> pushed down, final interpreted as the code of the compute engine (such as
> Spark, Flink) with distributed computing capabilities for data merge.
>
> Based on this design, Quicksql makes extensive use of the ability of
> Calcite Adapter \ Dialect \ UDF to provide syntax adaptation compatibility
> for various data sources and compute engines, and also uses Avatica as a
> JDBC protocol. We are very grateful for the excellent artwork provided by
> the Calcite community.
>
> At the beginning of the project, Quicksql was confused about the
> application areas. After one year of polishing, Quicksql has successfully
> applied two areas:
> 1. Interactive Query Engine: Provides big data interactive query and BI
> analysis with standard SQL syntax, and response time is in seconds to
> minutes.
> 2. ETL Compute Engine: SQL-based ETL for multi-data source, which can use
> optimization capabilities of SQL for data cleaning \ transformation \ join,
> etc.
> In the future, we will also focus on dynamic engine selection, so that
> engines such as Hive, Spark, and Presto can run more suitable SQL.
>
> Looking forward to working with the Calcite community to do some
> interesting things and explore the unlimited possibilities of SQL
>
> Siyuan Liu
>
> On Mon, Mar 2, 2020 at 3:45 PM Francis Du <fran...@francisdu.com> wrote:
>
> > Hi everyone:
> >
> > Allow me to introduce my good friend Siyuan Liu, who is the leader of
> > Quicksql project.
> >
> > I CC to him and ask him to introduce the project to us.Here is the
> > documentation link for
> >
> > Quicksql [1].
> >
> > [1].  https://quicksql.readthedocs.io/en/latest/
> >
> > Regards,
> > Francis
> >
> > Juan Pan <panj...@apache.org> 于2019年12月23日周一 上午11:44写道:
> >
> >> Thanks Gelbana,
> >>
> >>
> >> Very appreciated your explanation, which sheds me some light on
> exploring
> >> Calcite. :)
> >>
> >>
> >> Best wishes,
> >> Trista
> >>
> >>
> >>  Juan Pan (Trista)
> >>
> >> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
> >> E-mail: panj...@apache.org
> >>
> >>
> >>
> >>
> >> On 12/22/2019 05:58,Muhammad Gelbana<m.gelb...@gmail.com> wrote:
> >> I am curious how to join the tables from different datasources.
> >> Based on Calcite's conventions concept, the Join operator and its input
> >> operators should all have the same convention. If they don't, the
> >> convention different from the Join operator's convention will have to
> >> register a converter rule. This rule should produce an operator that
> only
> >> converts from that convention to the Join operator's convention.
> >>
> >> This way the Join operator will be able to handle the data obtained from
> >> its input operators because it understands the data structure.
> >>
> >> Thanks,
> >> Gelbana
> >>
> >>
> >> On Wed, Dec 18, 2019 at 5:08 AM Juan Pan <panj...@apache.org> wrote:
> >>
> >> Some updates.
> >>
> >>
> >> Recently i took a look at their doc and source code, and found this
> >> project uses SQL parsing and Relational algebra of Calcite to get query
> >> plan, and also translates to spark SQL for joining different
> datasources,
> >> or corresponding query for single datasource.
> >>
> >>
> >> Although it copies many classes from Calcite, the idea of QuickSQL seems
> >> some of interests, and code is succinct.
> >>
> >>
> >> Best,
> >> Trista
> >>
> >>
> >> Juan Pan (Trista)
> >>
> >> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
> >> E-mail: panj...@apache.org
> >>
> >>
> >>
> >>
> >> On 12/13/2019 17:16,Juan Pan<panj...@apache.org> wrote:
> >> Yes, indeed.
> >>
> >>
> >> Juan Pan (Trista)
> >>
> >> Senior DBA & PPMC of Apache ShardingSphere(Incubating)
> >> E-mail: panj...@apache.org
> >>
> >>
> >>
> >>
> >> On 12/12/2019 18:00,Alessandro Solimando<alessandro.solima...@gmail.com
> >
> >> wrote:
> >> Adapters must be needed by data sources not supporting SQL, I think this
> >> is
> >> what Juan Pan was asking for.
> >>
> >> On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan <hy...@apache.org> wrote:
> >>
> >> Nope, it doesn't use any adapters. It just submits partial SQL query to
> >> different engines.
> >>
> >> If query contains table from single source, e.g.
> >> select count(*) from hive_table1, hive_table2 where a=b;
> >> then the whole query will be submitted to hive.
> >>
> >> Otherwise, e.g.
> >> select distinct a,b from hive_table union select distinct a,b from
> >> mysql_table;
> >>
> >> The following query will be submitted to Spark and executed by Spark:
> >> select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;
> >>
> >> spark_tmp_table1: select distinct a,b from hive_table
> >> spark_tmp_table2: select distinct a,b from mysql_table
> >>
> >> On 2019/12/11 04:27:07, "Juan Pan" <panj...@apache.org> wrote:
> >> Hi Haisheng,
> >>
> >>
> >> The query on different data source will then be registered as temp
> >> spark tables (with filter or join pushed in), the whole query is
> rewritten
> >> as SQL text over these temp tables and submitted to Spark.
> >>
> >>
> >> Does it mean QuickSQL also need adaptors to make query executed on
> >> different data source?
> >>
> >>
> >> Yes, virtualization is one of Calcite’s goals. In fact, when I created
> >> Calcite I was thinking about virtualization + in-memory materialized
> >> views.
> >> Not only the Spark convention but any of the “engine” conventions
> (Drill,
> >> Flink, Beam, Enumerable) could be used to create a virtual query engine.
> >>
> >>
> >> Basically, i like and agree with Julian’s statement. It is a great idea
> >> which personally hope Calcite move towards.
> >>
> >>
> >> Give my best wishes to Calcite community.
> >>
> >>
> >> Thanks,
> >> Trista
> >>
> >>
> >> Juan Pan
> >>
> >>
> >> panj...@apache.org
> >> Juan Pan(Trista), Apache ShardingSphere
> >>
> >>
> >> On 12/11/2019 10:53,Haisheng Yuan<h.y...@alibaba-inc.com> wrote:
> >> As far as I know, users still need to register tables from other data
> >> sources before querying it. QuickSQL uses Calcite for parsing queries
> and
> >> optimizing logical expressions with several transformation rules. The
> >> query
> >> on different data source will then be registered as temp spark tables
> >> (with
> >> filter or join pushed in), the whole query is rewritten as SQL text over
> >> these temp tables and submitted to Spark.
> >>
> >> - Haisheng
> >>
> >> ------------------------------------------------------------------
> >> 发件人:Rui Wang<amaliu...@apache.org>
> >> 日 期:2019年12月11日 06:24:45
> >> 收件人:<dev@calcite.apache.org>
> >> 主 题:Re: Quicksql
> >>
> >> The co-routine model sounds fitting into Streaming cases well.
> >>
> >> I was thinking how should Enumerable interface work with streaming cases
> >> but now I should also check Interpreter.
> >>
> >>
> >> -Rui
> >>
> >> On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde <jh...@apache.org> wrote:
> >>
> >> The goal (or rather my goal) for the interpreter is to replace
> >> Enumerable as the quick, easy default convention.
> >>
> >> Enumerable is efficient but not that efficient (compared to engines
> >> that work on off-heap data representing batches of records). And
> >> because it generates java byte code there is a certain latency to
> >> getting a query prepared and ready to run.
> >>
> >> It basically implements the old Volcano query evaluation model. It is
> >> single-threaded (because all work happens as a result of a call to
> >> 'next()' on the root node) and cannot handle branching data-flow
> >> graphs (DAGs).
> >>
> >> The Interpreter operates uses a co-routine model (reading from queues,
> >> writing to queues, and yielding when there is no work to be done) and
> >> therefore could be more efficient than enumerable in a single-node
> >> multi-core system. Also, there is little start-up time, which is
> >> important for small queries.
> >>
> >> I would love to add another built-in convention that uses Arrow as
> >> data format and generates co-routines for each operator. Those
> >> co-routines could be deployed in a parallel and/or distributed data
> >> engine.
> >>
> >> Julian
> >>
> >> On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
> >> <zolyfar...@yahoo.com.invalid> wrote:
> >>
> >> What is the ultimate goal of the Calcite Interpreter?
> >>
> >> To provide some context, I have been playing around with calcite + REST
> >> (see
> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest
> >> <
> >> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest>
> for
> >> detail of my experiments)
> >>
> >>
> >> —Z
> >>
> >> On Dec 9, 2019, at 9:05 PM, Julian Hyde <jh...@apache.org> wrote:
> >>
> >> Yes, virtualization is one of Calcite’s goals. In fact, when I created
> >> Calcite I was thinking about virtualization + in-memory materialized
> >> views.
> >> Not only the Spark convention but any of the “engine” conventions
> (Drill,
> >> Flink, Beam, Enumerable) could be used to create a virtual query engine.
> >>
> >> See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
> >>
> >>
> >>
> >>
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> >> <
> >>
> >>
> >>
> >>
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> >> .
> >>
> >> Julian
> >>
> >>
> >>
> >> On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana <mgelb...@apache.org>
> >> wrote:
> >>
> >> I recently contacted one of the active contributors asking about the
> >> purpose of the project and here's his reply:
> >>
> >> From my understanding, Quicksql is a data virtualization platform. It
> >> can
> >> query multiple data sources altogether and in a distributed way;
> >> Say, you
> >> can write a SQL with a MySql table join with an Elasticsearch table.
> >> Quicksql can recognize that, and then generate Spark code, in which
> >> it will
> >> fetch the MySQL/ES data as a temporary table separately, and then
> >> join them
> >> in Spark. The execution is in Spark so it is totally distributed.
> >> The user
> >> doesn't need to aware of where the table is from.
> >>
> >>
> >> I understand that the Spark convention Calcite has attempts to
> >> achieve the
> >> same goal, but it isn't fully implemented yet.
> >>
> >>
> >> On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde <jh...@apache.org> wrote:
> >>
> >> Anyone know anything about Quicksql? It seems to be quite a popular
> >> project, and they have an internal fork of Calcite.
> >>
> >> https://github.com/Qihoo360/ <https://github.com/Qihoo360/>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> >> <
> >>
> >>
> >>
> >>
> >>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> >>
> >>
> >> Julian
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>

Reply via email to