[
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722744#comment-16722744
]
Lai Zhou edited comment on CALCITE-2741 at 12/17/18 7:31 AM:
-------------------------------------------------------------
[~julianhyde], I have seen what you did for Oracle functions . But We need to
do more to support hive operators.
Here is what I did:
1.Add 'fun=hive' config, register all op to a HiveSqlOperatorTable, so a
SqlCall can be converted dynamically to
a RexCall which holds a hive operator instance. I think it'd be better not to
reuse the op instances of SqlStdOperatorTable, because the types of a hive
operator is always not deterministic, the data type of a hive operator's
output result depends on the data type of it's input parameters. We use a
HiveOperatorWrapper to resolve the hive GenericUDF dynamically , when
deriveType for a SqlCall ,it creates a instance of GenericUDF, call the
initialize method to get the correct result type.
So we need to add a sql type mapping from java to hive.
2.Define Implementor for hive operators.We have to modify the RexImplTable to
define Implementor for hive operator, because we want to reuse the enumerable
implemention as far as possible. It will be reasonable to give an extension
hook to define user's Implementor , may be I have not found the right way .
3.To improve performance of the executing for a EnumerableRel, we inject some
final fields into the generated class 'Baz',which holds the GenericUDF
instance, to avoid creating new instance repeatedly .
Besides,we do a little modification for the Parser to support some special
operator ,such as rlike ,regexp (where c rlike '...')
All above modifications is done on calcite 1.17.0.
Now can you give me some suggestions, what's the right way to get all these
things done ?
was (Author: hhlai1990):
[~julianhyde], I have seen what you did for Oracle functions . But We need to
do more to support hive operators.
Here is what I did:
1.Add 'fun=hive' config, register all op to a HiveSqlOperatorTable, so a
SqlCall can be converted dynamically to
a RexCall which holds a hive operator instance. I think it'd be better not to
reuse the op instances of SqlStdOperatorTable, because the types of a hive
operator is always not deterministic, the data type of a hive operator's
output result depends on the data type of it's input parameters. We use a
HiveOperatorWrapper to resolve the hive GenericUDF dynamically , when
deriveType for a SqlCall ,it creates a instance of GenericUDF, call the
initialize method to get the correct result type.
So we need to add a sql type mapping from java to hive.
2.Define Implementor for hive operators.We have to modify the RexImplTable to
define Implementor for hive operator, because we want to reuse the enumerable
implemention as far as possible. It will be reasonable to give an extension
hook to define user's Implementor , may be I have not found the right way .
3.To improve performance of the executing for a EnumerableRel, we inject some
final fields to the generated class 'Baz',which holds the GenericUDF instance,
to avoid creating new instance repeatedly .
Besides,we do a little modification for the Parser to support some special
operator ,such as rlike ,regexp (where c rlike '...')
All above modifications is done on calcite 1.17.0.
Now can you give me some suggestions, what's the right way to get all these
things done ?
> Add operator table with Hive-specific built-in functions
> --------------------------------------------------------
>
> Key: CALCITE-2741
> URL: https://issues.apache.org/jira/browse/CALCITE-2741
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Reporter: Lai Zhou
> Assignee: Julian Hyde
> Priority: Minor
>
> [~julianhyde],
> I extended the native enummerable implemention of calcite to support Hive sql
> ,include UDF、UDAF and all the SqlSpecialOperator,which inspired by apache
> Drills.
> I modified the parser,type systems,and bridge the hive operator .
> How do you think of supporting a direct implemention of hive sql like this?
> I think it will be valueable when someone want to migrate his hive etl jobs
> to real-time scene.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)