This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 61f903f [SPARK-31331][SQL][DOCS] Document Spark integration with Hive UDFs/UDAFs/UDTFs 61f903f is described below commit 61f903fa7aa853e6462362aa8d335e3fc192d831 Author: Huaxin Gao <huax...@us.ibm.com> AuthorDate: Thu Apr 9 13:28:01 2020 -0500 [SPARK-31331][SQL][DOCS] Document Spark integration with Hive UDFs/UDAFs/UDTFs ### What changes were proposed in this pull request? Document Spark integration with Hive UDFs/UDAFs/UDTFs ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1031" alt="Screen Shot 2020-04-02 at 2 22 42 PM" src="https://user-images.githubusercontent.com/13592258/78301971-cc7cf080-74ee-11ea-93c8-7d4c75213b47.png"> ### How was this patch tested? Manually build and check Closes #28104 from huaxingao/hive-udfs. Lead-authored-by: Huaxin Gao <huax...@us.ibm.com> Co-authored-by: Takeshi Yamamuro <yamam...@apache.org> Signed-off-by: Sean Owen <sro...@gmail.com> --- docs/sql-ref-functions-udf-hive.md | 88 +++++++++++++++++++++++++++++++++++++- 1 file changed, 87 insertions(+), 1 deletion(-) diff --git a/docs/sql-ref-functions-udf-hive.md b/docs/sql-ref-functions-udf-hive.md index 8698be7..a87266d 100644 --- a/docs/sql-ref-functions-udf-hive.md +++ b/docs/sql-ref-functions-udf-hive.md @@ -19,4 +19,90 @@ license: | limitations under the License. --- -Integration with Hive UDFs/UDAFs/UDTFs \ No newline at end of file +### Description + +Spark SQL supports integration of Hive UDFs, UDAFs and UDTFs. Similar to Spark UDFs and UDAFs, Hive UDFs work on a single row as input and generate a single row as output, while Hive UDAFs operate on multiple rows and return a single aggregated row as a result. In addition, Hive also supports UDTFs (User Defined Tabular Functions) that act on one row as input and return multiple rows as output. To use Hive UDFs/UDAFs/UTFs, the user should register them in Spark, and then use them in Spar [...] + +### Examples + +Hive has two UDF interfaces: [UDF](https://github.com/apache/hive/blob/master/udf/src/java/org/apache/hadoop/hive/ql/exec/UDF.java) and [GenericUDF](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java). +An example below uses [GenericUDFAbs](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAbs.java) derived from `GenericUDF`. + +{% highlight sql %} +-- Register `GenericUDFAbs` and use it in Spark SQL. +-- Note that, if you use your own programmed one, you need to add a JAR containig it +-- into a classpath, +-- e.g., ADD JAR yourHiveUDF.jar; +CREATE TEMPORARY FUNCTION testUDF AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFAbs'; + +SELECT * FROM t; + +-----+ + |value| + +-----+ + | -1.0| + | 2.0| + | -3.0| + +-----+ + +SELECT testUDF(value) FROM t; + +--------------+ + |testUDF(value)| + +--------------+ + | 1.0| + | 2.0| + | 3.0| + +--------------+ +{% endhighlight %} + + +An example below uses [GenericUDTFExplode](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFExplode.java) derived from [GenericUDTF](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java). + +{% highlight sql %} +-- Register `GenericUDTFExplode` and use it in Spark SQL +CREATE TEMPORARY FUNCTION hiveUDTF + AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode'; + +SELECT * FROM t; + +------+ + | value| + +------+ + |[1, 2]| + |[3, 4]| + +------+ + +SELECT hiveUDTF(value) FROM t; + +---+ + |col| + +---+ + | 1| + | 2| + | 3| + | 4| + +---+ +{% endhighlight %} + +Hive has two UDAF interfaces: [UDAF](https://github.com/apache/hive/blob/master/udf/src/java/org/apache/hadoop/hive/ql/exec/UDAF.java) and [GenericUDAFResolver](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFResolver.java). +An example below uses [GenericUDAFSum](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java) derived from `GenericUDAFResolver`. + +{% highlight sql %} +-- Register `GenericUDAFSum` and use it in Spark SQL +CREATE TEMPORARY FUNCTION hiveUDAF + AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum'; + +SELECT * FROM t; + +---+-----+ + |key|value| + +---+-----+ + | a| 1| + | a| 2| + | b| 3| + +---+-----+ + +SELECT key, hiveUDAF(value) FROM t GROUP BY key; + +---+---------------+ + |key|hiveUDAF(value)| + +---+---------------+ + | b| 3| + | a| 3| + +---+---------------+ +{% endhighlight %} \ No newline at end of file --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org