[jira] [Comment Edited] (PHOENIX-538) Support UDFs

Rajeshbabu Chintaguntla (JIRA) Sun, 22 Mar 2015 12:02:15 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375110#comment-14375110
 ]


Rajeshbabu Chintaguntla edited comment on PHOENIX-538 at 3/22/15 7:00 PM:
--------------------------------------------------------------------------

[~jamestaylor]
I am working on this and about to complete. I had some doubts can you please 
clarify.
1) bq. Since HBase supports loading jars dynamically, this would not be too 
difficult.
To support dynamically loading the jars we can use DynamicClassLoader in HBase 
and We should configure hbase.dynamic.jars.dir with the path of jars in hdfs.
This creates hadoop jar dependency at phoenix side because we need to have file 
system related classes in classpath which are present in 
hadoop-common,hadoop-hdfs. We need to check any way we can avoid this 
dependency. 
2) You have mentioned that creation function syntax should be as below
{noformat}
CREATE FUNCTION mdHash(anytype)
RETURNS binary(16)
LOCATION 'hdfs://path-to-my-jar' 'com.me.MDHashFunction'
{noformat}

I think it's better to use same syntax as Hive to make it consistent with it. 
What do you say?
{noformat}
CREATE [TEMPORARY] FUNCTION function_name AS class_name
  [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
{noformat}
And also we can support add/delete jar queries to add jars to path of 
hbase.dynamic.jars.dir configuration independently before creating function so 
that we can use simplified version of create function. This can be done as 
improvement later. suggestions?
{noformat}
ADD JAR[S] <filepath1> [<filepath2>]*
LIST JAR[S] [<filepath1> <filepath2> ..]
DELETE JAR[S] [<filepath1> <filepath2> ..] 
{noformat}
{noformat}
CREATE [TEMPORARY] FUNCTION function_name AS class_name
{noformat}

3) If we come to function arguments I think it's fine to support phoenix data 
types as argument types right?
Ex: create function MY_REVERSE(VARCHAR,CHAR) returns VARCHAR
 or we need to support java data types(This makes things complex like mapping 
java types to phoenix types)? 

4)Do we need to allow to specify any details for arguments like not null, 
constant, max length,precision,scale? Currently we are providing these through 
annotations in built-in functions classes.These details helps to validate 
whether proper function or not. If we don't want to support this UDF developer 
should take care of providing allowed arguments and details same as for buitin 
functions.


was (Author: rajeshbabu):
[~jamestaylor]
I am working on this and about to complete this. I had some doubts can you 
please clarify.
1) bq. Since HBase supports loading jars dynamically, this would not be too 
difficult.
To support dynamically loading the jars we can use DynamicClassLoader and We 
should configure hbase.dynamic.jars.dir with the path of jars in hdfs.
This creates hadoop jar dependency at phoenix side because we need to have file 
system related classes in classpath which are present in 
hadoop-common,hadoop-hdfs. We need to check any way we can avoid this 
dependency. 
2) You have mentioned creation function syntax should be as below
{noformat}
CREATE FUNCTION mdHash(anytype)
RETURNS binary(16)
LOCATION 'hdfs://path-to-my-jar' 'com.me.MDHashFunction'
{noformat}

I think it's better to use same syntax as Hive to make it consistent with Hive. 
What do you say?
{noformat}
CREATE [TEMPORARY] FUNCTION function_name AS class_name
  [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
{noformat}
And also we can support add/delete jar queries to add jars to path of 
hbase.dynamic.jars.dir configuration independently before creating function so 
that we can use simplified version of create function. This can be done as 
improvement later. suggestions?
{noformat}
ADD JAR[S] <filepath1> [<filepath2>]*
LIST JAR[S] [<filepath1> <filepath2> ..]
DELETE JAR[S] [<filepath1> <filepath2> ..] 
{noformat}
{noformat}
CREATE [TEMPORARY] FUNCTION function_name AS class_name
{noformat}

3) If we come to function arguments I think it's fine to support phoenix data 
types as argument types right?
Ex: create function MY_REVERSE(VARCHAR,CHAR) returns VARCHAR
 or we need to support java data types(This makes things complex like mapping 
java types to phoenix types)? 

4)Do we need to allow to specify any details for arguments like not null, 
constant, max length,precision,scale? Currently we are providing these through 
annotations in built-in functions classes.These details helps to validate 
whether proper function or not. If we don't want to support this UDF developer 
should take care of these same as builtin functions. 

> Support UDFs
> ------------
>
>                 Key: PHOENIX-538
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-538
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: James Taylor
>            Assignee: Rajeshbabu Chintaguntla
>
> Phoenix allows built-in functions to be added (as described 
> [here](http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html))
>  with the restriction that they must be in the phoenix jar. We should improve 
> on this and allow folks to declare new functions through a CREATE FUNCTION 
> command like this:
>       CREATE FUNCTION mdHash(anytype)
>       RETURNS binary(16)
>       LOCATION 'hdfs://path-to-my-jar' 'com.me.MDHashFunction'
> Since HBase supports loading jars dynamically, this would not be too 
> difficult. The function implementation class would be required to extend our 
> ScalarFunction base class. Here's how I could see it being implemented:
> * modify the phoenix grammar to support the new CREATE FUNCTION syntax
> * create a new UTFParseNode class to capture the parse state
> * add a new method to the MetaDataProtocol interface
> * add a new method in ConnectionQueryServices to invoke the MetaDataProtocol 
> method
> * add a new method in MetaDataClient to invoke the ConnectionQueryServices 
> method
> * persist functions in a new "SYSTEM.FUNCTION" table
> * add a new client-side representation to cache functions called PFunction
> * modify ColumnResolver to dynamically resolve a function in the same way we 
> dynamically resolve and load a table
> * create and register a new ExpressionType called UDFExpression
> * at parse time, check for the function name in the built in list first (as 
> is currently done), and if not found in the PFunction cache. If not found 
> there, then use the new UDFExpression as a placeholder and have the 
> ColumnResolver attempt to resolve it at compile time and throw an error if 
> unsuccessful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-538) Support UDFs

Reply via email to