[
https://issues.apache.org/jira/browse/SPARK-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251822#comment-14251822
]
William Benton commented on SPARK-4867:
---------------------------------------
I think in general it's a great idea to declare or register SQL functions with
type signatures. I also think that the more we can lean on Scala's type system
here, the better. The reason why I didn't make declaring signatures a
requirement for all functions in my PR for SPARK-2863 is that it seems like the
interface gets hairy pretty quickly if it's going to be general enough to work
for the use cases we can reasonably expect.
The simplest example of where things get complicated is numeric types: many
functions in existing systems are polymorphic, taking either Doubles or
Decimals, and then returning the type they took. (In practice, it seems like
Hive in particular doesn't do much that keeps the precision of Decimals intact,
but that's another matter.) So we'd need a type signature interface that
supports type variables and constraints, so that the expected signature for
addition could look something like (A, B) => C, with annotations indicating
that A is either Double or Decimal, B is either Double or Decimal, and C is the
least upper bound of A and B. (It's certainly possible to special-case numeric
coercions, but that's sort of a bummer -- and this class of problem is one we'd
want to solve for UDFs anyway.)
In any case, I'm definitely interested in working with you on both design and
implementation!
> UDF clean up
> ------------
>
> Key: SPARK-4867
> URL: https://issues.apache.org/jira/browse/SPARK-4867
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Michael Armbrust
> Priority: Blocker
>
> Right now our support and internal implementation of many functions has a few
> issues. Specifically:
> - UDFS don't know their input types and thus don't do type coercion.
> - We hard code a bunch of built in functions into the parser. This is bad
> because in SQL it creates new reserved words for things that aren't actually
> keywords. Also it means that for each function we need to add support to
> both SQLContext and HiveContext separately.
> For this JIRA I propose we do the following:
> - Change the interfaces for registerFunction and ScalaUdf to include types
> for the input arguments as well as the output type.
> - Add a rule to analysis that does type coercion for UDFs.
> - Add a parse rule for functions to SQLParser.
> - Rewrite all the UDFs that are currently hacked into the various parsers
> using this new functionality.
> Depending on how big this refactoring becomes we could split parts 1&2 from
> part 3 above.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]