[
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833649#comment-16833649
]
Lai Zhou edited comment on CALCITE-2741 at 5/8/19 2:43 AM:
-----------------------------------------------------------
[~zabetak],I also think it was not exactly an adapter. My initial goal was to
build a real-time/high-performance in memory sql engine that supports hive sql
dialects on top of Calcite.
I had a try to use the JDBC interface first, but I encountered some issues:
# custom config issue: For every JDBC connection, we need put the data of
current session into the schema, it means that current schema is bound to
current session.
So the static SchemaFactory can't work out for this, we need introduce the DDL
functions like what was in calcite-server module. The SqlDdlNodes in
calcite-server module would populate the table through FrameworkConfig API .
When we execute a sql like
{code:java}
create table t1 as select * from t2 where t2.id>100{code}
the populate method will be invoked,see
[SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221]
. We need custom the FrameworkConfig here, include
OperatorTable,SqlConformance and more other custom configs. By the way, the
FrameworkConfig should be builded with all the configs from current
CalcitePrepare.Context rather than only the rootSchema , it was a bug.
And the config options of CalcitePrepare.Context was just a subset of
FrameworkConfig, most of the time we need use the FrameworkConfig API directly
to build a new sql engine.
When we execute a sql like
{code:java}
select * from t2 where t2.id>100
{code}
CalcitePrepareImpl would handle this sql flow, it did the similar thing, but
some configs are hard coded , such as RexExecutor,Programs.
When implementing the EnumerableRel, the RelImplementor also might need be
customized, see the example
[HiveEnumerableRelImplementor.java|https://github.com/51nb/marble/blob/master/marble-table-hive/src/main/java/org/apache/calcite/adapter/hive/HiveEnumerableRelImplementor.java].
Now the JDBC interface didn't provide the way to custom these configs, so we
proposed a new Table API that inspired by Apache Flink, to simplify the usage
of Calcite when building a new sql engine.
2. cache issue: It's not easy to cache the whole sql plan if we use JDBC
interface to handle a query, due to it's multiple-phase processing flow, but it
is very easy to do this with the Table API,see
[TableEnv.java#L412|https://github.com/51nb/marble/blob/master/marble-table/src/main/java/org/apache/calcite/table/TableEnv.java#L412].
summary:
The proposed Table API makes it easy to config the sql engine and cache the
whole sql plan to improve the query performance.
was (Author: hhlai1990):
[~zabetak],I also think it was not exactly an adapter. My initial goal was to
build a real-time/high-performance in memory sql engine that supports hive sql
dialects on top of Calcite.
I had a try to use the JDBC interface first, but I encountered some issues:
# custom config issue: For every JDBC connection, we need put the data of
current session into the schema, it means that current schema is bound to
current session.
So the static SchemaFactory can't work out for this, we need introduce the DDL
functions like what was in calcite-server module. The SqlDdlNodes in
calcite-server module would populate the table through FrameworkConfig API .
When we execute a sql like
{code:java}
create table t1 as select * from t2 where t2.id>100{code}
the populate method will be invoked,see
[SqlDdlNodes.java#L221|https://github.com/apache/calcite/blob/0d504d20d47542e8d461982512ae0e7a94e4d6cb/server/src/main/java/org/apache/calcite/sql/ddl/SqlDdlNodes.java#L221]
. We need custom the FrameworkConfig here, include
OperatorTable,SqlConformance and more other custom configs. By the way, the
FrameworkConfig should be builded with all the configs from current
CalcitePrepare.Context rather than only the rootSchema , it was a bug.
And the config options of CalcitePrepare.Context was just a subset of
FrameworkConfig, most of the time we need use the FrameworkConfig API directly
to build a new sql engine.
When we execute a sql like
{code:java}
select * from t2 where t2.id>100
{code}
CalcitePrepareImpl would handle this sql flow, it did the similar thing, but
some configs are hard coded , such as RexExecutor,Programs.
When implementing the EnumerableRel, the RelImplementor also might need be
customized, see the example
[HiveEnumerableRelImplementor.java|https://github.com/51nb/marble/blob/master/marble-table-hive/src/main/java/org/apache/calcite/adapter/hive/HiveEnumerableRelImplementor.java].
Now the JDBC interface didn't provide the way to custom these configs, so we
proposed a new Table API that inspired by Apache Flink, to simplify the usage
of Calcite when building a new sql engine.
2. cache issue: It's not easy to cache the whole sql plan if we use JDBC
interface to handle a query, due to it's multiple-phase processing flow, but it
is very easy to do this with the Table API,see
[TableEnv.java#L412|https://github.com/51nb/marble/blob/master/marble-table/src/main/java/org/apache/calcite/table/TableEnv.java#L412].
summary:
The proposed Table API makes it easy to config the sql engine and cache the
whole sql plan to improve the query performance.It fits the scenes that satisfy
these conditions:
the datasources are deterministic and already in memory, there is no
computation need to be pushed down;
-the sql queries are deterministic,without dynamic parameters, so the whole sql
plan cache will be helpful(we can also use placeholders in the execution plan
to cache the dynamic query ).-
> Add operator table with Hive-specific built-in functions
> --------------------------------------------------------
>
> Key: CALCITE-2741
> URL: https://issues.apache.org/jira/browse/CALCITE-2741
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.19.0
> Reporter: Lai Zhou
> Priority: Minor
>
> I write a hive adapter for calcite to support Hive sql ,includes
> UDF、UDAF、UDTF and some of SqlSpecialOperator.
> How do you think of supporting a direct implemention of hive sql like this?
> I think it will be valuable when someone want to migrate his hive etl jobs to
> real-time scene.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)