[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

Lai Zhou (JIRA) Sun, 16 Dec 2018 23:32:37 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722744#comment-16722744
 ]


Lai Zhou edited comment on CALCITE-2741 at 12/17/18 7:31 AM:
-------------------------------------------------------------

[~julianhyde], I have seen what you did for Oracle functions . But We  need to  
do more to support hive operators.

Here is what I did:

1.Add  'fun=hive' config, register all op to a HiveSqlOperatorTable,  so a 
SqlCall  can be converted dynamically to

a RexCall which holds a hive operator instance. I think it'd be better  not to 
reuse the op instances of SqlStdOperatorTable, because the types of a hive 
operator is always  not deterministic, the data type of a hive operator's 
output result depends on the data type of  it's  input parameters. We use a 
HiveOperatorWrapper to resolve the hive GenericUDF  dynamically , when 
deriveType for a SqlCall ,it creates a instance of GenericUDF, call the 
initialize method to get the correct result type.

So we need to add  a sql type mapping from java  to hive.

2.Define Implementor for hive operators.We have to  modify the  RexImplTable to 
define Implementor for hive operator, because we want to reuse the enumerable 
implemention as far as possible.  It will be reasonable to give an extension 
hook to define user's  Implementor , may be I have not found the right way .

3.To improve performance of  the executing for a EnumerableRel, we inject some 
final fields into the generated class 'Baz',which holds the GenericUDF 
instance, to avoid  creating new instance repeatedly .

Besides，we do  a little modification for the Parser to  support some special 
operator ,such as rlike ,regexp (where c rlike '...')

All above modifications is done on  calcite 1.17.0.

Now can you give me some suggestions, what's the right way to get all these 
things done ?

 


was (Author: hhlai1990):
[~julianhyde], I have seen what you did for Oracle functions . But We  need to  
do more to support hive operators.

Here is what I did:

1.Add  'fun=hive' config, register all op to a HiveSqlOperatorTable,  so a 
SqlCall  can be converted dynamically to

a RexCall which holds a hive operator instance. I think it'd be better  not to 
reuse the op instances of SqlStdOperatorTable, because the types of a hive 
operator is always  not deterministic, the data type of a hive operator's 
output result depends on the data type of  it's  input parameters. We use a 
HiveOperatorWrapper to resolve the hive GenericUDF  dynamically , when 
deriveType for a SqlCall ,it creates a instance of GenericUDF, call the 
initialize method to get the correct result type.

So we need to add  a sql type mapping from java  to hive.

2.Define Implementor for hive operators.We have to  modify the  RexImplTable to 
define Implementor for hive operator, because we want to reuse the enumerable 
implemention as far as possible.  It will be reasonable to give an extension 
hook to define user's  Implementor , may be I have not found the right way .

3.To improve performance of  the executing for a EnumerableRel, we inject some 
final fields to the generated class 'Baz',which holds the GenericUDF instance, 
to avoid  creating new instance repeatedly .

Besides，we do  a little modification for the Parser to  support some special 
operator ,such as rlike ,regexp (where c rlike '...')

All above modifications is done on  calcite 1.17.0.

Now can you give me some suggestions, what's the right way to get all these 
things done ?

 

> Add operator table with Hive-specific built-in functions
> --------------------------------------------------------
>
>                 Key: CALCITE-2741
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2741
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>            Reporter: Lai Zhou
>            Assignee: Julian Hyde
>            Priority: Minor
>
> [~julianhyde],
> I extended the native enummerable implemention of calcite to support Hive sql 
> ,include UDF、UDAF and all the SqlSpecialOperator,which inspired by apache 
> Drills.
> I modified the parser,type systems,and bridge the hive operator .
> How do you think of supporting a direct implemention of hive sql like this?
> I think it will be valueable when someone want to migrate his hive etl jobs 
> to real-time scene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CALCITE-2741) Add operator table with Hive-specific built-in functions

Reply via email to