[
https://issues.apache.org/jira/browse/SPARK-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559423#comment-14559423
]
Santiago M. Mola commented on SPARK-4867:
-----------------------------------------
Maybe this issue can be split in smaller tasks? A lot of built-in functions can
be removed from the parser quite easily by registering them in the
FunctionRegistry. I am doing this with a lot of fixed-arity functions.
I'm using some helper functions to create FunctionBuilders for Expression for
use with the FunctionRegistry. The main helper looks like this:
{code}
def expression[T <: Expression](arity: Int)(implicit tag: ClassTag[T]):
ExpressionBuilder = {
val argTypes = (1 to arity).map(x => classOf[Expression])
val constructor = tag.runtimeClass.getDeclaredConstructor(argTypes: _*)
(expressions: Seq[Expression]) => {
if (expressions.size != arity) {
throw new IllegalArgumentException(
s"Invalid number of arguments: ${expressions.size} (must be equal to
$arity)"
)
}
constructor.newInstance(expressions: _*).asInstanceOf[Expression]
}
}
{code}
and can be used like this:
{code}
functionRegistry.registerFunction("MY_FUNCTION", expression[MyFunction])
{code}
If this approach looks like what is needed, I can extend it to use expressions
with a variable number of parameters. Also, with some syntatic sugar we can
provide a function that works this way:
{code}
functionRegistry.registerFunction[MyFunction]
// Register the builder produced by expression[MyFunction] with name
"MY_FUNCTION" by using a camelcase -> underscore-separated conversion.
{code}
How does this sound?
> UDF clean up
> ------------
>
> Key: SPARK-4867
> URL: https://issues.apache.org/jira/browse/SPARK-4867
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Reporter: Michael Armbrust
> Priority: Blocker
>
> Right now our support and internal implementation of many functions has a few
> issues. Specifically:
> - UDFS don't know their input types and thus don't do type coercion.
> - We hard code a bunch of built in functions into the parser. This is bad
> because in SQL it creates new reserved words for things that aren't actually
> keywords. Also it means that for each function we need to add support to
> both SQLContext and HiveContext separately.
> For this JIRA I propose we do the following:
> - Change the interfaces for registerFunction and ScalaUdf to include types
> for the input arguments as well as the output type.
> - Add a rule to analysis that does type coercion for UDFs.
> - Add a parse rule for functions to SQLParser.
> - Rewrite all the UDFs that are currently hacked into the various parsers
> using this new functionality.
> Depending on how big this refactoring becomes we could split parts 1&2 from
> part 3 above.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]