[GitHub] [flink] xuefuz commented on a change in pull request #9445: [FLINK-13706][hive] add documentation of how to use Hive functions in…

GitBox Wed, 14 Aug 2019 16:24:41 -0700

xuefuz commented on a change in pull request #9445: [FLINK-13706][hive] add 
documentation of how to use Hive functions in…
URL: https://github.com/apache/flink/pull/9445#discussion_r314124473


 ##########
 File path: docs/dev/table/hive/hive_functions.md
 ##########
 @@ -0,0 +1,150 @@
+---
+title: "Hive Functions"
+nav-parent_id: hive_tableapi
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Hive User Defined Functions
+
+Users can use their existing Hive User Defined Functions in Flink.
+
+Supported UDF types include:
+
+- UDF
+- GenericUDF
+- GenericUDTF
+- UDAF
+- GenericUDAFResolver2
+
+Upon query planning and execution, Hive's UDF and GenericUDF are automatically 
translated into Flink's ScalarFunction, 
+Hive's GenericUDTF are automatically translated into Flink's TableFunction, 
+and Hive's UDAF and GenericUDAFResolver2 are translated into Flink's 
AggregateFunction.
+
+To use Hive User Defined Functions, user must set a HiveCatalog backed by Hive 
Metastore that contains that function as the current catalog, and have to use 
Blink planner.
+
+## Using Hive User Defined Functions
+
+Assuming we have the following Hive functions registered in Hive Metastore:
+
+
+{% highlight java %}
+/**
+ * Test simple udf. Registered under name 'myudf'
+ */
+public class TestHiveSimpleUDF extends UDF {
+
+       public IntWritable evaluate(IntWritable i) {
+               return new IntWritable(i.get());
+       }
+
+       public Text evaluate(Text text) {
+               return new Text(text.toString());
+       }
+}
+
+/**
+ * Test generic udf. Registered under name 'mygenericudf'
+ */
+public class TestHiveGenericUDF extends GenericUDF {
+
+       @Override
+       public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
+               checkArgument(arguments.length == 2);
+
+               checkArgument(arguments[1] instanceof ConstantObjectInspector);
+               Object constant = ((ConstantObjectInspector) 
arguments[1]).getWritableConstantValue();
+               checkArgument(constant instanceof IntWritable);
+               checkArgument(((IntWritable) constant).get() == 1);
+
+               if (arguments[0] instanceof IntObjectInspector ||
+                               arguments[0] instanceof StringObjectInspector) {
+                       return arguments[0];
+               } else {
+                       throw new RuntimeException("Not support argument: " + 
arguments[0]);
+               }
+       }
+
+       @Override
+       public Object evaluate(DeferredObject[] arguments) throws HiveException 
{
+               return arguments[0].get();
+       }
+
+       @Override
+       public String getDisplayString(String[] children) {
+               return "TestHiveGenericUDF";
+       }
+}
+
+/**
+ * Test split udtf. Registered under name 'mygenericudtf'
+ */
+public class TestHiveUDTF extends GenericUDTF {
+
+       @Override
+       public StructObjectInspector initialize(ObjectInspector[] argOIs) 
throws UDFArgumentException {
+               checkArgument(argOIs.length == 2);
+
+               // TEST for constant arguments
+               checkArgument(argOIs[1] instanceof ConstantObjectInspector);
+               Object constant = ((ConstantObjectInspector) 
argOIs[1]).getWritableConstantValue();
+               checkArgument(constant instanceof IntWritable);
+               checkArgument(((IntWritable) constant).get() == 1);
+
+               return ObjectInspectorFactory.getStandardStructObjectInspector(
+                       Collections.singletonList("col1"),
+                       
Collections.singletonList(PrimitiveObjectInspectorFactory.javaStringObjectInspector));
+       }
+
+       @Override
+       public void process(Object[] args) throws HiveException {
+               String str = (String) args[0];
+               for (String s : str.split(",")) {
+                       forward(s);
+                       forward(s);
+               }
+       }
+
+       @Override
+       public void close() {
+       }
+}
+
+{% endhighlight %}
+
+Users can use them in SQL as:
+
+
+{% highlight bash %}
+
+Flink SQL> select mygenericudf(myudf(name), 1) as a, mygenericudf(myudf(age), 
1) as b, s from mysourcetable, lateral table(myudtf(name, 1)) as T(s);
+
+Flink SQL> insert into mysinktable select a, s, sum(b), myudaf(b) from (%s) 
group by a, s;
+
+{% endhighlight %}
+
+
+### Limitations
+
+Hive built-in functions are currently not supported out of box in Flink. To 
use Hive built-in functions, users must register them manually in Hive 
Metastore first.
+
+Support for Hive functions have only been tested for Flink batch in Blink 
planner.
+
+Hive functions currently cannot be used across catalogs in Flink.
 
 Review comment:
   Do we need to mention somewhere that to use those udfs, the relevant jars 
have to be in Flink's classpath?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #9445: [FLINK-13706][hive] add documentation of how to use Hive functions in…

Reply via email to