[ 
https://issues.apache.org/jira/browse/SPARK-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559423#comment-14559423
 ] 

Santiago M. Mola commented on SPARK-4867:
-----------------------------------------

Maybe this issue can be split in smaller tasks? A lot of built-in functions can 
be removed from the parser quite easily by registering them in the 
FunctionRegistry. I am doing this with a lot of fixed-arity functions.

I'm using some helper functions to create FunctionBuilders for Expression for 
use with the FunctionRegistry. The main helper looks like this:

{code}
  def expression[T <: Expression](arity: Int)(implicit tag: ClassTag[T]): 
ExpressionBuilder = {
    val argTypes = (1 to arity).map(x => classOf[Expression])
    val constructor = tag.runtimeClass.getDeclaredConstructor(argTypes: _*)
    (expressions: Seq[Expression]) => {
      if (expressions.size != arity) {
        throw new IllegalArgumentException(
          s"Invalid number of arguments: ${expressions.size} (must be equal to 
$arity)"
        )
      }
      constructor.newInstance(expressions: _*).asInstanceOf[Expression]
    }
  }
{code}

and can be used like this:

{code}
functionRegistry.registerFunction("MY_FUNCTION", expression[MyFunction])
{code}

If this approach looks like what is needed, I can extend it to use expressions 
with a variable number of parameters. Also, with some syntatic sugar we can 
provide a function that works this way:

{code}
functionRegistry.registerFunction[MyFunction]
// Register the builder produced by expression[MyFunction] with name 
"MY_FUNCTION" by using a camelcase -> underscore-separated conversion.
{code}

How does this sound?


> UDF clean up
> ------------
>
>                 Key: SPARK-4867
>                 URL: https://issues.apache.org/jira/browse/SPARK-4867
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>            Priority: Blocker
>
> Right now our support and internal implementation of many functions has a few 
> issues.  Specifically:
>  - UDFS don't know their input types and thus don't do type coercion.
>  - We hard code a bunch of built in functions into the parser.  This is bad 
> because in SQL it creates new reserved words for things that aren't actually 
> keywords.  Also it means that for each function we need to add support to 
> both SQLContext and HiveContext separately.
> For this JIRA I propose we do the following:
>  - Change the interfaces for registerFunction and ScalaUdf to include types 
> for the input arguments as well as the output type.
>  - Add a rule to analysis that does type coercion for UDFs.
>  - Add a parse rule for functions to SQLParser.
>  - Rewrite all the UDFs that are currently hacked into the various parsers 
> using this new functionality.
> Depending on how big this refactoring becomes we could split parts 1&2 from 
> part 3 above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to