[ https://issues.apache.org/jira/browse/PHOENIX-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494793#comment-14494793 ]
Rajeshbabu Chintaguntla commented on PHOENIX-538: ------------------------------------------------- Thank you so much [~jamestaylor] for review. bq. One thing that's missing (at least I couldn't find it) was how you're going to handle serialization/deserialization for UDFs. Phoenix today relies on ExpressionType to do this. Every expression (including built-in functions) is listed in this ordinal. We transfer only the ordinal over to the server and then the server uses ExpressionType to construct an empty Expression and expression.readFields() to populate it. For UDFs, it seems like you'd need to pass over the class name (and jar?) so that the server would know what class to instantiate. I think you'd also need the tenant ID to be available from the BooleanExpressionFilter which is the root of what would instantiate these on the server side, so that you could get the right class loader. bq. nother thing I couldn't find (but I suspect maybe you haven't done it yet?), is the dynamically loading of the jars. For this, given your scoping of functions to tenant ID, I think you'd want a different class loader per tenant ID, plus one for the global tenant (i.e. when no tenant ID is specified). I have handled these in the current patch. Added UDFExpression in which we write/read tenantid,classname,jar path along with expression. Using this information for loading the classes using DynamicClassLoader supported by HBase. The functions jars should place in hdfs directory of hbase.dynamic.jars.dir configuration value. Even if we create function with jar in other hdfs location still it will be loaded. As you mentioned having a class loader for each tenant id and caching them. bq. Instead, I'd collection all UDFParseNodes at parse time by keeping a list of these per statement. This I have not handled James. You mean after parsing the statement we need to check for all UDFParse nodes in the statement and have list. I am thinking something like we have Do we need to have some parsenode visitor to check udfparse nodes. But the parse nodes can be any where like where parse nodes, select parse nodes, having parse nodes depending on the query. Do you have any idea how we can collect all together in a statement? bq. Then, modify MetaDataEndPoint.getTable() to accept a list of these and resolve them all (use a SkipScanFilter) when the table is resolved. You mean getfunction method, because all functions stored in SYSTEM.FUNCTION table but getTable will be executed on catalog table only. bq. dd the same split policy we have on SYSTEM.CATALOG on SYSTEM.FUNCTION to ensure that all tables for a given tenant will be in the same region. I have added this. bq. Some minor stuff: rename this to resolveFunction(): Done. bq. Keep all ParseNodes immutable, so no UDFParseNode.setInfo(), but use a copy constructor instead. Done. bq. An alternative which I've seen before is CREATE [OR REPLACE]instead of the IF NOT EXISTS. What do other RDBMS support? You are correct James. All other RDBMS support CREATE [OR REPLACE] this. I have changed accordingly. I will implement replace part while handling alter functions(may be replace it self should be enough). bq. Other than that, my only concern is the amount of copy/paste code between the client/server caching we do for a table and now for a function. Now in the patch sharing the PTableCache to hold functions as well and changed the class to PMetaDataCache. Please review the latest patch. Thanks. > Support UDFs > ------------ > > Key: PHOENIX-538 > URL: https://issues.apache.org/jira/browse/PHOENIX-538 > Project: Phoenix > Issue Type: Task > Reporter: James Taylor > Assignee: Rajeshbabu Chintaguntla > Fix For: 5.0.0, 4.4.0 > > Attachments: PHOENIX-538-wip.patch, PHOENIX-538_v1.patch > > > Phoenix allows built-in functions to be added (as described > [here](http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html)) > with the restriction that they must be in the phoenix jar. We should improve > on this and allow folks to declare new functions through a CREATE FUNCTION > command like this: > CREATE FUNCTION mdHash(anytype) > RETURNS binary(16) > LOCATION 'hdfs://path-to-my-jar' 'com.me.MDHashFunction' > Since HBase supports loading jars dynamically, this would not be too > difficult. The function implementation class would be required to extend our > ScalarFunction base class. Here's how I could see it being implemented: > * modify the phoenix grammar to support the new CREATE FUNCTION syntax > * create a new UTFParseNode class to capture the parse state > * add a new method to the MetaDataProtocol interface > * add a new method in ConnectionQueryServices to invoke the MetaDataProtocol > method > * add a new method in MetaDataClient to invoke the ConnectionQueryServices > method > * persist functions in a new "SYSTEM.FUNCTION" table > * add a new client-side representation to cache functions called PFunction > * modify ColumnResolver to dynamically resolve a function in the same way we > dynamically resolve and load a table > * create and register a new ExpressionType called UDFExpression > * at parse time, check for the function name in the built in list first (as > is currently done), and if not found in the PFunction cache. If not found > there, then use the new UDFExpression as a placeholder and have the > ColumnResolver attempt to resolve it at compile time and throw an error if > unsuccessful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)