hequn8128 commented on a change in pull request #9886:
[FLINK-14027][python][doc] Add documentation for Python user-defined functions.
URL: https://github.com/apache/flink/pull/9886#discussion_r334395076
##########
File path: docs/dev/table/udfs.md
##########
@@ -112,50 +152,54 @@ public class HashCode extends ScalarFunction {
}
'''
+class HashCode(ScalarFunction):
+ def eval(self, s):
+ return hash(s) * factor
+
table_env = BatchTableEnvironment.create(env)
-# register the java function
+# register the Java function
table_env.register_java_function("hashCode", "my.java.function.HashCode")
+# register the Python function
+table_env.register_function("py_hash_code", udf(HashCode(),
DataTypes.STRING(), DataTypes.BIGINT()))
+
# use the function in Python Table API
-my_table.select("string, string.hashCode(), hashCode(string)")
+my_table.select("string, string.hashCode(), hashCode(string),
string.py_hash_code(), py_hash_code(string)")
# use the function in SQL API
-table_env.sql_query("SELECT string, hashCode(string) FROM MyTable")
+table_env.sql_query("SELECT string, hashCode(string), py_hash_code(string)
FROM MyTable")
{% endhighlight %}
-</div>
-</div>
-By default the result type of an evaluation method is determined by Flink's
type extraction facilities. This is sufficient for basic types or simple POJOs
but might be wrong for more complex, custom, or composite types. In these cases
`TypeInformation` of the result type can be manually defined by overriding
`ScalarFunction#getResultType()`.
+There are many ways to define a Python scalar function besides extending the
base class `ScalarFunction`. The following example shows the different ways to
define a Python scalar function which takes two columns of bigint as input
parameters and returns the sum of them as the result.
-The following example shows an advanced example which takes the internal
timestamp representation and also returns the internal timestamp representation
as a long value. By overriding `ScalarFunction#getResultType()` we define that
the returned long value should be interpreted as a `Types.TIMESTAMP` by the
code generation.
+{% highlight python %}
+# option 1: extending the base class `ScalarFunction`
Review comment:
Also add an example that shows how to register the function with env and how
to use it with select(). We can add one example at the bottom of this section,
i.e., after option 5, to avoid duplication. The doc looks like:
```
# option 1: extending the base class `ScalarFunction`
class Add(ScalarFunction):
def eval(self, i, j):
return i + j
add = udf(Add(), [DataTypes.BIGINT(), DataTypes.BIGINT()],
DataTypes.BIGINT())
# option 2: Python function
@udf(input_types=[DataTypes.BIGINT(), DataTypes.BIGINT()],
result_type=DataTypes.BIGINT())
def add(i, j):
return i + j
# option 3: lambda function
add = udf(lambda i, j: i + j, [DataTypes.BIGINT(), DataTypes.BIGINT()],
DataTypes.BIGINT())
# option 4: callable function
class CallableAdd(object):
def __call__(self, i, j):
return i + j
add = udf(CallableAdd(), [DataTypes.BIGINT(), DataTypes.BIGINT()],
DataTypes.BIGINT())
# option 5: partial function
def partial_add(i, j, k):
return i + j + k
add = udf(functools.partial(partial_add, k=1), DataTypes.BIGINT(),
DataTypes.BIGINT())
# register the Python function
table_env.register_function("add", add)
# use the function in Python Table API
my_table.select("add(a, b)")
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services