[
https://issues.apache.org/jira/browse/SPARK-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-16954.
----------------------------------
Resolution: Incomplete
> UDFs should allow output type to be specified in terms of the input type
> ------------------------------------------------------------------------
>
> Key: SPARK-16954
> URL: https://issues.apache.org/jira/browse/SPARK-16954
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Simeon Simeonov
> Priority: Major
> Labels: SQL, UDF, bulk-closed, types
>
> Consider an {{array_compact}} UDF that removes {{null}} values from an array.
> There is no easy way to implement this UDF because an explicit return type
> with a {{TypeTag}} is required by code generation.
> The interesting observation here is that the output type of `array_compact`
> is the same as its input type. In general, there is a broad class of UDFs,
> especially collection-oriented ones, whose output types are functions of the
> input types. In our Spark work we have found collection manipulation UDFs to
> be very powerful for cleaning up data and substantially improving
> performance, in particular, avoiding {{explode}} followed by {{groupBy}}. It
> would be nice if Spark made adding these types of UDFs very easy.
> I won't go into possible ways to implement this under the covers as there are
> many options but I do want to point out that it is possible to communicate
> the right type information to Spark without changing the signature for UDF
> registration using placeholder types, e.g.,
> {code}
> sealed trait UDFArgumentAtPosition
> case class ArgPos1 extends UDFArgumentAtPosition
> case class ArgPos2 extends UDFArgumentAtPosition
> // ...
> case class Struct[ArgPos <: UDFArgumentAtPosition](value: Row)
> case class ArrayElement[ArgPos <: UDFArgumentAtPosition, A : TypeTag](value:
> A)
> case class MapKey[ArgPos <: UDFArgumentAtPosition, A : TypeTag](value: A)
> case class MapValue[ArgPos <: UDFArgumentAtPosition, A : TypeTag](value: A)
> // Functions are stubbed just to show compilation succeeds
> def arrayCompact[A : TypeTag](xs: Seq[A]): ArgPos1 = null
> def arraySum[A : Numeric : TypeTag](xs: Seq[A]): ArrayElement[ArgPos1, A] =
> ArrayElement(implicitly[Numeric[A]].zero)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]