[ 
https://issues.apache.org/jira/browse/PHOENIX-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajeshbabu Chintaguntla updated PHOENIX-538:
--------------------------------------------
    Attachment: PHOENIX-538_v3.patch

[~jamestaylor]
Here is the full patch addressing all the inputs/comments you have provided 
earlier which are missed in the previous patches.
More specifically
bq.  I'd make sure you have a way of disallowing it, though, too just in case 
folks don't want to allow this functionality.
Handled. by default this is disabled. To use UDFs users need to configure/pass 
phoenix.functions.allowUserDefinedFunctions property with true.

bq. One thing I'd change is when you resolve functions. For tables, we resolve 
them once when the ColumnResolver is instantiated. For functions, though, you 
resolve them in the visitor which would mean a separate RPC would happen for 
each occurrence of each function during each visitor (which is a lot). Instead, 
I'd collection all UDFParseNodes at parse time by keeping a list of these per 
statement. Then, modify MetaDataEndPoint.getTable() to accept a list of these 
and resolve them all (use a SkipScanFilter) when the table is resolved. Add the 
same split policy we have on SYSTEM.CATALOG on SYSTEM.FUNCTION to ensure that 
all tables for a given tenant will be in the same region.
This I have handled in this patch. Now in the resolver getting list of 
functions all together instead of one request for each function.

bq. In PhoenixSQL.g, you can just add a new member variable for a List< 
UDFParseNode>. You'd add to this list where factory.function() throws because 
it can't find the function in a catch block. You'd clear the list in the 
finally block of the oneStatement rule, as this would be the scope of where 
you'd want to collect this list. And you'd pass in this list to the appropriate 
factory method (like factory.select(), factory.delete(), factory.upsert(), 
etc.). 
This has handled in this patch. Collecting the udf functions in during query 
parsing and passing to factory methods. I have done for 
upsert,delete,select,createindex methods because these are the queries accept 
functions.

bq. What about tests where different tenants have the same named UDF with 
different implementations (invoked on the client-side)? Are you loading them 
into separate class loaders?
Will you have tests where both a global UDF is used with a tenant-specific UDF, 
as the class loader stuff may get tricky.
Handled these. Added tests verifying these and they working fine.

Added more tests in this patch.

Please review the latest patch. Thank you very much for reviews.

> Support UDFs
> ------------
>
>                 Key: PHOENIX-538
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-538
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: James Taylor
>            Assignee: Rajeshbabu Chintaguntla
>             Fix For: 5.0.0, 4.4.0
>
>         Attachments: PHOENIX-538-wip.patch, PHOENIX-538_v1.patch, 
> PHOENIX-538_v2.patch, PHOENIX-538_v3.patch
>
>
> Phoenix allows built-in functions to be added (as described 
> [here](http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html))
>  with the restriction that they must be in the phoenix jar. We should improve 
> on this and allow folks to declare new functions through a CREATE FUNCTION 
> command like this:
>       CREATE FUNCTION mdHash(anytype)
>       RETURNS binary(16)
>       LOCATION 'hdfs://path-to-my-jar' 'com.me.MDHashFunction'
> Since HBase supports loading jars dynamically, this would not be too 
> difficult. The function implementation class would be required to extend our 
> ScalarFunction base class. Here's how I could see it being implemented:
> * modify the phoenix grammar to support the new CREATE FUNCTION syntax
> * create a new UTFParseNode class to capture the parse state
> * add a new method to the MetaDataProtocol interface
> * add a new method in ConnectionQueryServices to invoke the MetaDataProtocol 
> method
> * add a new method in MetaDataClient to invoke the ConnectionQueryServices 
> method
> * persist functions in a new "SYSTEM.FUNCTION" table
> * add a new client-side representation to cache functions called PFunction
> * modify ColumnResolver to dynamically resolve a function in the same way we 
> dynamically resolve and load a table
> * create and register a new ExpressionType called UDFExpression
> * at parse time, check for the function name in the built in list first (as 
> is currently done), and if not found in the PFunction cache. If not found 
> there, then use the new UDFExpression as a placeholder and have the 
> ColumnResolver attempt to resolve it at compile time and throw an error if 
> unsuccessful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to