This is an automated email from the ASF dual-hosted git repository.
dianfu pushed a commit to branch release-1.13
in repository https://gitbox.apache.org/repos/asf/flink.git
The following commit(s) were added to refs/heads/release-1.13 by this push:
new 8ffefda [FLINK-20086][python][docs] Add documentation about how to
override open() in UserDefinedFunction to load resources
8ffefda is described below
commit 8ffefdaed5f2ac22a9b720dc5d0293b7be88a0d4
Author: Yik San Chan <[email protected]>
AuthorDate: Wed Apr 28 19:20:21 2021 +0800
[FLINK-20086][python][docs] Add documentation about how to override open()
in UserDefinedFunction to load resources
This closes #15795.
---
.../docs/dev/python/table/udfs/python_udfs.md | 20 ++++++++++++++++++++
.../docs/dev/python/table/udfs/python_udfs.md | 20 ++++++++++++++++++++
2 files changed, 40 insertions(+)
diff --git a/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md
b/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md
index e9f9217..f743144 100644
--- a/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md
+++ b/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md
@@ -558,3 +558,23 @@ class
ListViewConcatTableAggregateFunction(TableAggregateFunction):
如果你在非 local 模式下运行 Python UDFs 和 Pandas UDFs,且 Python UDFs 没有定义在含 `main()` 入口的
Python 主文件中,强烈建议你通过 [`python-files`]({{< ref "docs/dev/python/python_config"
>}}#python-files) 配置项指定 Python UDF 的定义。
否则,如果你将 Python UDFs 定义在名为 `my_udf.py` 的文件中,你可能会遇到 `ModuleNotFoundError: No
module named 'my_udf'` 这样的报错。
+
+## 在 UDF 中载入资源
+
+有时候,我们想在 UDF 中只载入一次资源,然后反复使用该资源进行计算。例如,你想在 UDF 中首先载入一个巨大的深度学习模型,然后使用该模型多次进行预测。
+
+你要做的是重载 `UserDefinedFunction` 类的 `open` 方法。
+
+```
+class Predict(ScalarFunction):
+ def open(self, function_context):
+ import pickle
+
+ with open("resources.zip/resources/model.pkl", "rb") as f:
+ self.model = pickle.load(f)
+
+ def eval(self, x):
+ return self.model.predict(x)
+
+predict = udf(Predict(), result_type=DataTypes.DOUBLE(), func_type="pandas")
+```
diff --git a/docs/content/docs/dev/python/table/udfs/python_udfs.md
b/docs/content/docs/dev/python/table/udfs/python_udfs.md
index 95504d8..09ef12f 100644
--- a/docs/content/docs/dev/python/table/udfs/python_udfs.md
+++ b/docs/content/docs/dev/python/table/udfs/python_udfs.md
@@ -557,3 +557,23 @@ class
ListViewConcatTableAggregateFunction(TableAggregateFunction):
To run Python UDFs (as well as Pandas UDFs) in any non-local mode, it is
strongly recommended to bundle your Python UDF definitions using the config
option [`python-files`]({{< ref "docs/dev/python/python_config"
>}}#python-files), if your Python UDFs live outside of the file where the
`main()` function is defined.
Otherwise, you may run into `ModuleNotFoundError: No module named 'my_udf'` if
you define Python UDFs in a file called `my_udf.py`.
+
+## Loading resources in UDFs
+
+There are scenarios when you want to load some resources in UDFs first, then
running computation (i.e., `eval`) over and over again, without having to
re-load the resources. For example, you may want to load a large deep learning
model only once, then run batch prediction against the model multiple times.
+
+Overriding the `open` method of `UserDefinedFunction` is exactly what you need.
+
+```python
+class Predict(ScalarFunction):
+ def open(self, function_context):
+ import pickle
+
+ with open("resources.zip/resources/model.pkl", "rb") as f:
+ self.model = pickle.load(f)
+
+ def eval(self, x):
+ return self.model.predict(x)
+
+predict = udf(Predict(), result_type=DataTypes.DOUBLE(), func_type="pandas")
+```