This is an automated email from the ASF dual-hosted git repository.

dianfu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git


The following commit(s) were added to refs/heads/master by this push:
     new c593209  [FLINK-20086][python][docs] Add documentation about how to 
override open() in UserDefinedFunction to load resources
c593209 is described below

commit c593209b664af0d112b22c386266476f3a2ee750
Author: Yik San Chan <[email protected]>
AuthorDate: Wed Apr 28 19:20:21 2021 +0800

    [FLINK-20086][python][docs] Add documentation about how to override open() 
in UserDefinedFunction to load resources
    
    This closes #15795.
---
 .../docs/dev/python/table/udfs/python_udfs.md        | 20 ++++++++++++++++++++
 .../docs/dev/python/table/udfs/python_udfs.md        | 20 ++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md 
b/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md
index e9f9217..f743144 100644
--- a/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md
+++ b/docs/content.zh/docs/dev/python/table/udfs/python_udfs.md
@@ -558,3 +558,23 @@ class 
ListViewConcatTableAggregateFunction(TableAggregateFunction):
 
 如果你在非 local 模式下运行 Python UDFs 和 Pandas UDFs,且 Python UDFs 没有定义在含 `main()` 入口的 
Python 主文件中,强烈建议你通过 [`python-files`]({{< ref "docs/dev/python/python_config" 
>}}#python-files) 配置项指定 Python UDF 的定义。
 否则,如果你将 Python UDFs 定义在名为 `my_udf.py` 的文件中,你可能会遇到 `ModuleNotFoundError: No 
module named 'my_udf'` 这样的报错。
+
+## 在 UDF 中载入资源
+
+有时候,我们想在 UDF 中只载入一次资源,然后反复使用该资源进行计算。例如,你想在 UDF 中首先载入一个巨大的深度学习模型,然后使用该模型多次进行预测。
+
+你要做的是重载 `UserDefinedFunction` 类的 `open` 方法。
+
+```
+class Predict(ScalarFunction):
+    def open(self, function_context):
+        import pickle
+
+        with open("resources.zip/resources/model.pkl", "rb") as f:
+            self.model = pickle.load(f)
+
+    def eval(self, x):
+        return self.model.predict(x)
+
+predict = udf(Predict(), result_type=DataTypes.DOUBLE(), func_type="pandas")
+```
diff --git a/docs/content/docs/dev/python/table/udfs/python_udfs.md 
b/docs/content/docs/dev/python/table/udfs/python_udfs.md
index 95504d8..09ef12f 100644
--- a/docs/content/docs/dev/python/table/udfs/python_udfs.md
+++ b/docs/content/docs/dev/python/table/udfs/python_udfs.md
@@ -557,3 +557,23 @@ class 
ListViewConcatTableAggregateFunction(TableAggregateFunction):
 
 To run Python UDFs (as well as Pandas UDFs) in any non-local mode, it is 
strongly recommended to bundle your Python UDF definitions using the config 
option [`python-files`]({{< ref "docs/dev/python/python_config" 
>}}#python-files), if your Python UDFs live outside of the file where the 
`main()` function is defined.
 Otherwise, you may run into `ModuleNotFoundError: No module named 'my_udf'` if 
you define Python UDFs in a file called `my_udf.py`.
+
+## Loading resources in UDFs
+
+There are scenarios when you want to load some resources in UDFs first, then 
running computation (i.e., `eval`) over and over again, without having to 
re-load the resources. For example, you may want to load a large deep learning 
model only once, then run batch prediction against the model multiple times.
+
+Overriding the `open` method of `UserDefinedFunction` is exactly what you need.
+
+```python
+class Predict(ScalarFunction):
+    def open(self, function_context):
+        import pickle
+
+        with open("resources.zip/resources/model.pkl", "rb") as f:
+            self.model = pickle.load(f)
+
+    def eval(self, x):
+        return self.model.predict(x)
+
+predict = udf(Predict(), result_type=DataTypes.DOUBLE(), func_type="pandas")
+```

Reply via email to