[jira] [Commented] (PHOENIX-538) Support UDFs

Rajeshbabu Chintaguntla (JIRA) Tue, 14 Apr 2015 13:19:20 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494793#comment-14494793
 ]


Rajeshbabu Chintaguntla commented on PHOENIX-538:
-------------------------------------------------

Thank you so much [~jamestaylor] for review.
bq. One thing that's missing (at least I couldn't find it) was how you're going 
to handle serialization/deserialization for UDFs. Phoenix today relies on 
ExpressionType to do this. Every expression (including built-in functions) is 
listed in this ordinal. We transfer only the ordinal over to the server and 
then the server uses ExpressionType to construct an empty Expression and 
expression.readFields() to populate it. For UDFs, it seems like you'd need to 
pass over the class name (and jar?) so that the server would know what class to 
instantiate. I think you'd also need the tenant ID to be available from the 
BooleanExpressionFilter which is the root of what would instantiate these on 
the server side, so that you could get the right class loader.
bq. nother thing I couldn't find (but I suspect maybe you haven't done it 
yet?), is the dynamically loading of the jars. For this, given your scoping of 
functions to tenant ID, I think you'd want a different class loader per tenant 
ID, plus one for the global tenant (i.e. when no tenant ID is specified).

I have handled these in the current patch. Added UDFExpression in which we 
write/read tenantid,classname,jar path along with expression. Using this 
information for loading the classes using DynamicClassLoader supported by 
HBase. The functions jars should place in hdfs directory of 
hbase.dynamic.jars.dir configuration value. Even if we create function with jar 
in other hdfs location still it will be loaded. As you mentioned having a class 
loader  for each tenant id and caching them.

bq. Instead, I'd collection all UDFParseNodes at parse time by keeping a list 
of these per statement. 
This I have not handled James. You mean after parsing  the statement we need to 
check for all UDFParse nodes in the statement and have list. I am thinking 
something like we have Do we need to have some parsenode visitor to check 
udfparse nodes. But the parse nodes can be any where like where parse nodes, 
select parse nodes, having parse nodes depending on the query. Do you have any 
idea how we can collect all together in a statement?

bq. Then, modify MetaDataEndPoint.getTable() to accept a list of these and 
resolve them all (use a SkipScanFilter) when the table is resolved. 
You mean getfunction method, because all functions stored in SYSTEM.FUNCTION 
table but getTable will be executed on catalog table only.

bq. dd the same split policy we have on SYSTEM.CATALOG on SYSTEM.FUNCTION to 
ensure that all tables for a given tenant will be in the same region.
I have added this.

bq. Some minor stuff: rename this to resolveFunction():
Done.

bq. Keep all ParseNodes immutable, so no UDFParseNode.setInfo(), but use a copy 
constructor instead.
Done.

bq. An alternative which I've seen before is CREATE [OR REPLACE]instead of the 
IF NOT EXISTS. What do other RDBMS support?
You are correct James. All other RDBMS support CREATE [OR REPLACE] this. I have 
changed accordingly. I will implement replace part while handling alter 
functions(may be replace it self should be enough).

bq. Other than that, my only concern is the amount of copy/paste code between 
the client/server caching we do for a table and now for a function.
Now in the patch sharing the PTableCache to hold functions as well and changed 
the class to PMetaDataCache.
Please review the latest patch. Thanks.


> Support UDFs
> ------------
>
>                 Key: PHOENIX-538
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-538
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: James Taylor
>            Assignee: Rajeshbabu Chintaguntla
>             Fix For: 5.0.0, 4.4.0
>
>         Attachments: PHOENIX-538-wip.patch, PHOENIX-538_v1.patch
>
>
> Phoenix allows built-in functions to be added (as described 
> [here](http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html))
>  with the restriction that they must be in the phoenix jar. We should improve 
> on this and allow folks to declare new functions through a CREATE FUNCTION 
> command like this:
>       CREATE FUNCTION mdHash(anytype)
>       RETURNS binary(16)
>       LOCATION 'hdfs://path-to-my-jar' 'com.me.MDHashFunction'
> Since HBase supports loading jars dynamically, this would not be too 
> difficult. The function implementation class would be required to extend our 
> ScalarFunction base class. Here's how I could see it being implemented:
> * modify the phoenix grammar to support the new CREATE FUNCTION syntax
> * create a new UTFParseNode class to capture the parse state
> * add a new method to the MetaDataProtocol interface
> * add a new method in ConnectionQueryServices to invoke the MetaDataProtocol 
> method
> * add a new method in MetaDataClient to invoke the ConnectionQueryServices 
> method
> * persist functions in a new "SYSTEM.FUNCTION" table
> * add a new client-side representation to cache functions called PFunction
> * modify ColumnResolver to dynamically resolve a function in the same way we 
> dynamically resolve and load a table
> * create and register a new ExpressionType called UDFExpression
> * at parse time, check for the function name in the built in list first (as 
> is currently done), and if not found in the PFunction cache. If not found 
> there, then use the new UDFExpression as a placeholder and have the 
> ColumnResolver attempt to resolve it at compile time and throw an error if 
> unsuccessful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-538) Support UDFs

Reply via email to