[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

Lai Zhou (JIRA) Tue, 07 May 2019 19:44:17 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833649#comment-16833649
 ]


Lai Zhou edited comment on CALCITE-2741 at 5/8/19 2:43 AM:
-----------------------------------------------------------

[~zabetak]，I also think it was not exactly an adapter. My initial goal was to

build a real-time/high-performance in memory sql engine that supports hive sql 
dialects on top of Calcite.

I had a try to use the JDBC interface first, but I encountered some issues:
 # custom config issue:  For every JDBC connection, we need put the data of 
current session into the schema, it means that current schema is bound to 
current session.

So the static SchemaFactory can't work out for this, we need introduce the DDL 
functions like what was in calcite-server module. The SqlDdlNodes in 

calcite-server module would populate the table through FrameworkConfig API .

When we execute a sql like 
{code:java}
create table t1 as select * from t2 where t2.id>100{code}
the populate method will be invoked，see  
[SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221]
 . We need custom the FrameworkConfig here, include 
OperatorTable,SqlConformance and more other custom configs. By the way, the 
FrameworkConfig should be builded with all the configs from current 
CalcitePrepare.Context rather than only the rootSchema , it was a bug.

And the config options of CalcitePrepare.Context was just a subset of 
FrameworkConfig, most of the time we need use the FrameworkConfig API directly 
to build a new sql engine.

When we execute a sql like 
{code:java}
select * from t2 where t2.id>100

{code}
CalcitePrepareImpl would handle this sql flow, it did the similar thing, but 
some configs are hard coded , such as RexExecutor,Programs.

When implementing the EnumerableRel, the RelImplementor also might need be 
customized, see the example 
[HiveEnumerableRelImplementor.java|https://github.com/51nb/marble/blob/master/marble-table-hive/src/main/java/org/apache/calcite/adapter/hive/HiveEnumerableRelImplementor.java].

Now the JDBC interface didn't provide the way to custom these configs, so we 
proposed a new Table API that inspired by Apache Flink, to simplify the usage 
of Calcite when building a new sql engine. 

      2. cache issue: It's not easy to cache the whole sql plan if we use JDBC 
interface to handle a query, due to it's multiple-phase processing flow, but it 
is very easy to do this with the Table API,see 
[TableEnv.java#L412|https://github.com/51nb/marble/blob/master/marble-table/src/main/java/org/apache/calcite/table/TableEnv.java#L412].

summary:

The proposed Table API makes it easy to config the sql engine and cache the 
whole sql plan to improve the query performance.

 

 

 

 

 

 

 

 


was (Author: hhlai1990):
[~zabetak]，I also think it was not exactly an adapter. My initial goal was to

build a real-time/high-performance in memory sql engine that supports hive sql 
dialects on top of Calcite.

I had a try to use the JDBC interface first, but I encountered some issues:
 # custom config issue:  For every JDBC connection, we need put the data of 
current session into the schema, it means that current schema is bound to 
current session.

So the static SchemaFactory can't work out for this, we need introduce the DDL 
functions like what was in calcite-server module. The SqlDdlNodes in 

calcite-server module would populate the table through FrameworkConfig API .

When we execute a sql like 
{code:java}
create table t1 as select * from t2 where t2.id>100{code}
the populate method will be invoked，see  
[SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221]
 . We need custom the FrameworkConfig here, include 
OperatorTable,SqlConformance and more other custom configs. By the way, the 
FrameworkConfig should be builded with all the configs from current 
CalcitePrepare.Context rather than only the rootSchema , it was a bug.

And the config options of CalcitePrepare.Context was just a subset of 
FrameworkConfig, most of the time we need use the FrameworkConfig API directly 
to build a new sql engine.

When we execute a sql like 
{code:java}
select * from t2 where t2.id>100

{code}
CalcitePrepareImpl would handle this sql flow, it did the similar thing, but 
some configs are hard coded , such as RexExecutor,Programs.

When implementing the EnumerableRel, the RelImplementor also might need be 
customized, see the example 
[HiveEnumerableRelImplementor.java|https://github.com/51nb/marble/blob/master/marble-table-hive/src/main/java/org/apache/calcite/adapter/hive/HiveEnumerableRelImplementor.java].

Now the JDBC interface didn't provide the way to custom these configs, so we 
proposed a new Table API that inspired by Apache Flink, to simplify the usage 
of Calcite when building a new sql engine. 

      2. cache issue: It's not easy to cache the whole sql plan if we use JDBC 
interface to handle a query, due to it's multiple-phase processing flow, but it 
is very easy to do this with the Table API,see 
[TableEnv.java#L412|https://github.com/51nb/marble/blob/master/marble-table/src/main/java/org/apache/calcite/table/TableEnv.java#L412].

summary:

The proposed Table API makes it easy to config the sql engine and cache the 
whole sql plan to improve the query performance.It fits the scenes that satisfy 
these conditions：

the datasources are  deterministic and already in memory, there is no 
computation need to be pushed down;

-the sql queries are deterministic,without dynamic parameters, so the whole sql 
plan cache will be helpful(we can also use placeholders in the execution plan 
to cache the dynamic query  ).-

 

 

 

 

 

 

 

 

> Add operator table with Hive-specific built-in functions
> --------------------------------------------------------
>
>                 Key: CALCITE-2741
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2741
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Lai Zhou
>            Priority: Minor
>
> I write a hive adapter for calcite to support Hive sql ,includes 
> UDF、UDAF、UDTF and some of SqlSpecialOperator.
> How do you think of supporting a direct implemention of hive sql like this?
> I think it will be valuable when someone want to migrate his hive etl jobs to 
> real-time scene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

Reply via email to