Thank you for your introduction, Siyuan. Quicksql is an interesting project and full of potential. I'd like to learn more about it.
Best, Chunwei On Tue, Mar 3, 2020 at 3:26 AM Siyuan Liu <geekli...@gmail.com> wrote: > Hi, everyone: > > Glad to see a lot of old friends here. Quicksql is a project born in early > 2019. It was designed to solve the problem of long and complex work flow in > the big data field with many data sources, many compute engines, and many > types of syntax. The core idea is `Connect All Data Sources with One Extra > Parsing Cost`. > > Because it involves standard SQL parsing, we finally chose Calcite as the > parsing engine that has the best SQL compatibility. Thanks to the excellent > architecture and toolkits provided by Calcite, Quicksql has made some > extensions on this basis and made more logical plans Rich definitions > enable single data source and multi-source queries to be described. For > single data sources, an end-to-end connection query is directly > established, and for multiple data sources, logical plans are divided and > pushed down, final interpreted as the code of the compute engine (such as > Spark, Flink) with distributed computing capabilities for data merge. > > Based on this design, Quicksql makes extensive use of the ability of > Calcite Adapter \ Dialect \ UDF to provide syntax adaptation compatibility > for various data sources and compute engines, and also uses Avatica as a > JDBC protocol. We are very grateful for the excellent artwork provided by > the Calcite community. > > At the beginning of the project, Quicksql was confused about the > application areas. After one year of polishing, Quicksql has successfully > applied two areas: > 1. Interactive Query Engine: Provides big data interactive query and BI > analysis with standard SQL syntax, and response time is in seconds to > minutes. > 2. ETL Compute Engine: SQL-based ETL for multi-data source, which can use > optimization capabilities of SQL for data cleaning \ transformation \ join, > etc. > In the future, we will also focus on dynamic engine selection, so that > engines such as Hive, Spark, and Presto can run more suitable SQL. > > Looking forward to working with the Calcite community to do some > interesting things and explore the unlimited possibilities of SQL > > Siyuan Liu > > On Mon, Mar 2, 2020 at 3:45 PM Francis Du <fran...@francisdu.com> wrote: > > > Hi everyone: > > > > Allow me to introduce my good friend Siyuan Liu, who is the leader of > > Quicksql project. > > > > I CC to him and ask him to introduce the project to us.Here is the > > documentation link for > > > > Quicksql [1]. > > > > [1]. https://quicksql.readthedocs.io/en/latest/ > > > > Regards, > > Francis > > > > Juan Pan <panj...@apache.org> 于2019年12月23日周一 上午11:44写道: > > > >> Thanks Gelbana, > >> > >> > >> Very appreciated your explanation, which sheds me some light on > exploring > >> Calcite. :) > >> > >> > >> Best wishes, > >> Trista > >> > >> > >> Juan Pan (Trista) > >> > >> Senior DBA & PPMC of Apache ShardingSphere(Incubating) > >> E-mail: panj...@apache.org > >> > >> > >> > >> > >> On 12/22/2019 05:58,Muhammad Gelbana<m.gelb...@gmail.com> wrote: > >> I am curious how to join the tables from different datasources. > >> Based on Calcite's conventions concept, the Join operator and its input > >> operators should all have the same convention. If they don't, the > >> convention different from the Join operator's convention will have to > >> register a converter rule. This rule should produce an operator that > only > >> converts from that convention to the Join operator's convention. > >> > >> This way the Join operator will be able to handle the data obtained from > >> its input operators because it understands the data structure. > >> > >> Thanks, > >> Gelbana > >> > >> > >> On Wed, Dec 18, 2019 at 5:08 AM Juan Pan <panj...@apache.org> wrote: > >> > >> Some updates. > >> > >> > >> Recently i took a look at their doc and source code, and found this > >> project uses SQL parsing and Relational algebra of Calcite to get query > >> plan, and also translates to spark SQL for joining different > datasources, > >> or corresponding query for single datasource. > >> > >> > >> Although it copies many classes from Calcite, the idea of QuickSQL seems > >> some of interests, and code is succinct. > >> > >> > >> Best, > >> Trista > >> > >> > >> Juan Pan (Trista) > >> > >> Senior DBA & PPMC of Apache ShardingSphere(Incubating) > >> E-mail: panj...@apache.org > >> > >> > >> > >> > >> On 12/13/2019 17:16,Juan Pan<panj...@apache.org> wrote: > >> Yes, indeed. > >> > >> > >> Juan Pan (Trista) > >> > >> Senior DBA & PPMC of Apache ShardingSphere(Incubating) > >> E-mail: panj...@apache.org > >> > >> > >> > >> > >> On 12/12/2019 18:00,Alessandro Solimando<alessandro.solima...@gmail.com > > > >> wrote: > >> Adapters must be needed by data sources not supporting SQL, I think this > >> is > >> what Juan Pan was asking for. > >> > >> On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan <hy...@apache.org> wrote: > >> > >> Nope, it doesn't use any adapters. It just submits partial SQL query to > >> different engines. > >> > >> If query contains table from single source, e.g. > >> select count(*) from hive_table1, hive_table2 where a=b; > >> then the whole query will be submitted to hive. > >> > >> Otherwise, e.g. > >> select distinct a,b from hive_table union select distinct a,b from > >> mysql_table; > >> > >> The following query will be submitted to Spark and executed by Spark: > >> select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2; > >> > >> spark_tmp_table1: select distinct a,b from hive_table > >> spark_tmp_table2: select distinct a,b from mysql_table > >> > >> On 2019/12/11 04:27:07, "Juan Pan" <panj...@apache.org> wrote: > >> Hi Haisheng, > >> > >> > >> The query on different data source will then be registered as temp > >> spark tables (with filter or join pushed in), the whole query is > rewritten > >> as SQL text over these temp tables and submitted to Spark. > >> > >> > >> Does it mean QuickSQL also need adaptors to make query executed on > >> different data source? > >> > >> > >> Yes, virtualization is one of Calcite’s goals. In fact, when I created > >> Calcite I was thinking about virtualization + in-memory materialized > >> views. > >> Not only the Spark convention but any of the “engine” conventions > (Drill, > >> Flink, Beam, Enumerable) could be used to create a virtual query engine. > >> > >> > >> Basically, i like and agree with Julian’s statement. It is a great idea > >> which personally hope Calcite move towards. > >> > >> > >> Give my best wishes to Calcite community. > >> > >> > >> Thanks, > >> Trista > >> > >> > >> Juan Pan > >> > >> > >> panj...@apache.org > >> Juan Pan(Trista), Apache ShardingSphere > >> > >> > >> On 12/11/2019 10:53,Haisheng Yuan<h.y...@alibaba-inc.com> wrote: > >> As far as I know, users still need to register tables from other data > >> sources before querying it. QuickSQL uses Calcite for parsing queries > and > >> optimizing logical expressions with several transformation rules. The > >> query > >> on different data source will then be registered as temp spark tables > >> (with > >> filter or join pushed in), the whole query is rewritten as SQL text over > >> these temp tables and submitted to Spark. > >> > >> - Haisheng > >> > >> ------------------------------------------------------------------ > >> 发件人:Rui Wang<amaliu...@apache.org> > >> 日 期:2019年12月11日 06:24:45 > >> 收件人:<dev@calcite.apache.org> > >> 主 题:Re: Quicksql > >> > >> The co-routine model sounds fitting into Streaming cases well. > >> > >> I was thinking how should Enumerable interface work with streaming cases > >> but now I should also check Interpreter. > >> > >> > >> -Rui > >> > >> On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde <jh...@apache.org> wrote: > >> > >> The goal (or rather my goal) for the interpreter is to replace > >> Enumerable as the quick, easy default convention. > >> > >> Enumerable is efficient but not that efficient (compared to engines > >> that work on off-heap data representing batches of records). And > >> because it generates java byte code there is a certain latency to > >> getting a query prepared and ready to run. > >> > >> It basically implements the old Volcano query evaluation model. It is > >> single-threaded (because all work happens as a result of a call to > >> 'next()' on the root node) and cannot handle branching data-flow > >> graphs (DAGs). > >> > >> The Interpreter operates uses a co-routine model (reading from queues, > >> writing to queues, and yielding when there is no work to be done) and > >> therefore could be more efficient than enumerable in a single-node > >> multi-core system. Also, there is little start-up time, which is > >> important for small queries. > >> > >> I would love to add another built-in convention that uses Arrow as > >> data format and generates co-routines for each operator. Those > >> co-routines could be deployed in a parallel and/or distributed data > >> engine. > >> > >> Julian > >> > >> On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas > >> <zolyfar...@yahoo.com.invalid> wrote: > >> > >> What is the ultimate goal of the Calcite Interpreter? > >> > >> To provide some context, I have been playing around with calcite + REST > >> (see > https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest > >> < > >> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> > for > >> detail of my experiments) > >> > >> > >> —Z > >> > >> On Dec 9, 2019, at 9:05 PM, Julian Hyde <jh...@apache.org> wrote: > >> > >> Yes, virtualization is one of Calcite’s goals. In fact, when I created > >> Calcite I was thinking about virtualization + in-memory materialized > >> views. > >> Not only the Spark convention but any of the “engine” conventions > (Drill, > >> Flink, Beam, Enumerable) could be used to create a virtual query engine. > >> > >> See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite) > >> > >> > >> > >> > https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework > >> < > >> > >> > >> > >> > https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework > >> . > >> > >> Julian > >> > >> > >> > >> On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana <mgelb...@apache.org> > >> wrote: > >> > >> I recently contacted one of the active contributors asking about the > >> purpose of the project and here's his reply: > >> > >> From my understanding, Quicksql is a data virtualization platform. It > >> can > >> query multiple data sources altogether and in a distributed way; > >> Say, you > >> can write a SQL with a MySql table join with an Elasticsearch table. > >> Quicksql can recognize that, and then generate Spark code, in which > >> it will > >> fetch the MySQL/ES data as a temporary table separately, and then > >> join them > >> in Spark. The execution is in Spark so it is totally distributed. > >> The user > >> doesn't need to aware of where the table is from. > >> > >> > >> I understand that the Spark convention Calcite has attempts to > >> achieve the > >> same goal, but it isn't fully implemented yet. > >> > >> > >> On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde <jh...@apache.org> wrote: > >> > >> Anyone know anything about Quicksql? It seems to be quite a popular > >> project, and they have an internal fork of Calcite. > >> > >> https://github.com/Qihoo360/ <https://github.com/Qihoo360/> > >> > >> > >> > >> > >> > >> > >> > https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite > >> < > >> > >> > >> > >> > >> > https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite > >> > >> > >> Julian > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >