UDTFs

srowen Thu, 09 Apr 2020 11:30:24 -0700

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 61f903f  [SPARK-31331][SQL][DOCS] Document Spark integration with Hive 
UDFs/UDAFs/UDTFs
61f903f is described below

commit 61f903fa7aa853e6462362aa8d335e3fc192d831
Author: Huaxin Gao <huax...@us.ibm.com>
AuthorDate: Thu Apr 9 13:28:01 2020 -0500

    [SPARK-31331][SQL][DOCS] Document Spark integration with Hive 
UDFs/UDAFs/UDTFs
    
    ### What changes were proposed in this pull request?
    Document Spark integration with Hive UDFs/UDAFs/UDTFs
    
    ### Why are the changes needed?
    To make SQL Reference complete
    
    ### Does this PR introduce any user-facing change?
    Yes
    <img width="1031" alt="Screen Shot 2020-04-02 at 2 22 42 PM" 
src="https://user-images.githubusercontent.com/13592258/78301971-cc7cf080-74ee-11ea-93c8-7d4c75213b47.png";>
    
    ### How was this patch tested?
    Manually build and check
    
    Closes #28104 from huaxingao/hive-udfs.
    
    Lead-authored-by: Huaxin Gao <huax...@us.ibm.com>
    Co-authored-by: Takeshi Yamamuro <yamam...@apache.org>
    Signed-off-by: Sean Owen <sro...@gmail.com>
---
 docs/sql-ref-functions-udf-hive.md | 88 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 87 insertions(+), 1 deletion(-)

diff --git a/docs/sql-ref-functions-udf-hive.md 
b/docs/sql-ref-functions-udf-hive.md
index 8698be7..a87266d 100644
--- a/docs/sql-ref-functions-udf-hive.md
+++ b/docs/sql-ref-functions-udf-hive.md
@@ -19,4 +19,90 @@ license: |
   limitations under the License.
 ---
 
-Integration with Hive UDFs/UDAFs/UDTFs
\ No newline at end of file
+### Description
+
+Spark SQL supports integration of Hive UDFs, UDAFs and UDTFs. Similar to Spark 
UDFs and UDAFs, Hive UDFs work on a single row as input and generate a single 
row as output, while Hive UDAFs operate on multiple rows and return a single 
aggregated row as a result. In addition, Hive also supports UDTFs (User Defined 
Tabular Functions) that act on one row as input and return multiple rows as 
output. To use Hive UDFs/UDAFs/UTFs, the user should register them in Spark, 
and then use them in Spar [...]
+
+### Examples
+
+Hive has two UDF interfaces: 
[UDF](https://github.com/apache/hive/blob/master/udf/src/java/org/apache/hadoop/hive/ql/exec/UDF.java)
 and 
[GenericUDF](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java).
+An example below uses 
[GenericUDFAbs](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAbs.java)
 derived from `GenericUDF`.
+
+{% highlight sql %}
+-- Register `GenericUDFAbs` and use it in Spark SQL.
+-- Note that, if you use your own programmed one, you need to add a JAR 
containig it
+-- into a classpath,
+-- e.g., ADD JAR yourHiveUDF.jar;
+CREATE TEMPORARY FUNCTION testUDF AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFAbs';
+
+SELECT * FROM t;
+  +-----+
+  |value|
+  +-----+
+  | -1.0|
+  |  2.0|
+  | -3.0|
+  +-----+
+
+SELECT testUDF(value) FROM t;
+  +--------------+
+  |testUDF(value)|
+  +--------------+
+  |           1.0|
+  |           2.0|
+  |           3.0|
+  +--------------+
+{% endhighlight %}
+
+
+An example below uses 
[GenericUDTFExplode](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFExplode.java)
 derived from 
[GenericUDTF](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java).
+
+{% highlight sql %}
+-- Register `GenericUDTFExplode` and use it in Spark SQL
+CREATE TEMPORARY FUNCTION hiveUDTF
+    AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode';
+
+SELECT * FROM t;
+  +------+
+  | value|
+  +------+
+  |[1, 2]|
+  |[3, 4]|
+  +------+
+
+SELECT hiveUDTF(value) FROM t;
+  +---+
+  |col|
+  +---+
+  |  1|
+  |  2|
+  |  3|
+  |  4|
+  +---+
+{% endhighlight %}
+
+Hive has two UDAF interfaces: 
[UDAF](https://github.com/apache/hive/blob/master/udf/src/java/org/apache/hadoop/hive/ql/exec/UDAF.java)
 and 
[GenericUDAFResolver](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFResolver.java).
+An example below uses 
[GenericUDAFSum](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java)
 derived from `GenericUDAFResolver`.
+
+{% highlight sql %}
+-- Register `GenericUDAFSum` and use it in Spark SQL
+CREATE TEMPORARY FUNCTION hiveUDAF
+    AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum';
+
+SELECT * FROM t;
+  +---+-----+
+  |key|value|
+  +---+-----+
+  |  a|    1|
+  |  a|    2|
+  |  b|    3|
+  +---+-----+
+
+SELECT key, hiveUDAF(value) FROM t GROUP BY key;
+  +---+---------------+
+  |key|hiveUDAF(value)|
+  +---+---------------+
+  |  b|              3|
+  |  a|              3|
+  +---+---------------+
+{% endhighlight %}
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31331][SQL][DOCS] Document Spark integration with Hive UDFs/UDAFs/UDTFs

Reply via email to