This is an automated email from the ASF dual-hosted git repository.
dianfu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git
The following commit(s) were added to refs/heads/master by this push:
new a62c5a35aa2 [FLINK-28944][python][docs] Update the Python execution
mode documentation
a62c5a35aa2 is described below
commit a62c5a35aa29014780bb5598facd0fbdf2db4426
Author: huangxingbo <[email protected]>
AuthorDate: Fri Aug 12 17:06:01 2022 +0800
[FLINK-28944][python][docs] Update the Python execution mode documentation
This closes #20561.
---
.../docs/dev/python/python_execution_mode.md | 72 +++++++++++++++++++---
.../docs/dev/python/python_execution_mode.md | 72 +++++++++++++++++++---
2 files changed, 128 insertions(+), 16 deletions(-)
diff --git a/docs/content.zh/docs/dev/python/python_execution_mode.md
b/docs/content.zh/docs/dev/python/python_execution_mode.md
index dae060ec65c..f26cbb3ad05 100644
--- a/docs/content.zh/docs/dev/python/python_execution_mode.md
+++ b/docs/content.zh/docs/dev/python/python_execution_mode.md
@@ -26,21 +26,22 @@ under the License.
The Python API supports different runtime execution modes from which you can
choose depending on the
requirements of your use case and the characteristics of your job. The Python
runtime execution mode
-defines how to execute your customized Python functions.
+defines how the Python user-defined functions will be executed.
Prior to release-1.15, there is the only execution mode called `PROCESS`
execution mode. The `PROCESS`
mode means that the Python user-defined functions will be executed in separate
Python processes.
In release-1.15, it has introduced a new execution mode called `THREAD`
execution mode. The `THREAD`
-mode means that the Python user-defined functions will be executed in the same
process as Java Operator,
-It should be noted that multiple Python user-defined functions running in the
same JVM are still affected by GIL.
+mode means that the Python user-defined functions will be executed in JVM.
+
+**NOTE:** Multiple Python user-defined functions running in the same JVM are
still affected by GIL.
## When can/should I use THREAD execution mode?
The purpose of the introduction of `THREAD` mode is to overcome the overhead
of serialization/deserialization
-and network communication caused in `PROCESS` mode. So if performance is not
your concern, or the computing
-logic of your customized Python functions is the performance bottleneck of the
job, `PROCESS` mode will
-be the best choice as `PROCESS` mode provides the best isolation compared to
`THREAD` mode.
+and network communication introduced of inter-process communication in the
`PROCESS` mode.
+So if performance is not your concern, or the computing logic of your Python
user-defined functions is the performance bottleneck of the job,
+`PROCESS` mode will be the best choice as `PROCESS` mode provides the best
isolation compared to `THREAD` mode.
## Configuring Python execution mode
@@ -48,18 +49,73 @@ The execution mode can be configured via the
`python.execution-mode` setting.
There are two possible values:
- `PROCESS`: The Python user-defined functions will be executed in separate
Python process. (default)
- - `THREAD`: The Python user-defined functions will be executed in the same
process as Java operator.
+ - `THREAD`: The Python user-defined functions will be executed in JVM.
-You could specify the Python execution mode using Python Table API as
following:
+You could specify the execution mode in Python Table API or Python DataStream
API jobs as following:
```python
+## Python Table API
# Specify `PROCESS` mode
table_env.get_config().set("python.execution-mode", "process")
# Specify `THREAD` mode
table_env.get_config().set("python.execution-mode", "thread")
+
+
+## Python DataStream API
+
+config = Configuration()
+
+# Specify `PROCESS` mode
+config.set_string("python.execution-mode", "process")
+
+# Specify `THREAD` mode
+config.set_string("python.execution-mode", "thread")
+
+# Create the corresponding StreamExecutionEnvironment
+env = StreamExecutionEnvironment.get_execution_environment(config)
```
+## Supported Cases
+
+### Python Table API
+
+The following table shows where the `THREAD` execution mode is supported in
Python Table API.
+
+| UDFs | `PROCESS` | `THREAD`|
+|-----|-----------|---------|
+| Python UDF | Yes | Yes |
+| Python UDTF | Yes | Yes |
+| Python UDAF | Yes | No |
+| Pandas UDF & Pandas UDAF | Yes | No |
+
+### Python DataStream API
+
+The following Table shows the supported cases in Python DataStream API.
+
+| Operators | `PROCESS` | `THREAD` |
+|-----------|-----------|----------|
+| Map | Yes | Yes |
+| FlatMap | Yes | Yes |
+| Filter | Yes | Yes |
+| Reduce | Yes | Yes |
+| Union | Yes | Yes |
+| Connect | Yes | Yes |
+| CoMap | Yes | Yes |
+| CoFlatMap | Yes | Yes |
+| Process Function | Yes | Yes |
+| Window Apply | Yes | Yes |
+| Window Aggregate | Yes | Yes |
+| Window Reduce | Yes | Yes |
+| Window Process | Yes | Yes |
+| Side Output | Yes | Yes |
+| State | Yes | Yes |
+| Iterate | No | No |
+| Window CoGroup | No | No |
+| Window Join | No | No |
+| Interval Join | No | No |
+| Async I/O | No | No |
+
{{< hint info >}}
Currently, it still doesn't support to execute Python UDFs in `THREAD`
execution mode in all places.
It will fall back to `PROCESS` execution mode in these cases. So it may happen
that you configure a job
diff --git a/docs/content/docs/dev/python/python_execution_mode.md
b/docs/content/docs/dev/python/python_execution_mode.md
index ef7afa9124e..a38b4cea07e 100644
--- a/docs/content/docs/dev/python/python_execution_mode.md
+++ b/docs/content/docs/dev/python/python_execution_mode.md
@@ -26,21 +26,22 @@ under the License.
The Python API supports different runtime execution modes from which you can
choose depending on the
requirements of your use case and the characteristics of your job. The Python
runtime execution mode
-defines how to execute your customized Python functions.
+defines how the Python user-defined functions will be executed.
Prior to release-1.15, there is the only execution mode called `PROCESS`
execution mode. The `PROCESS`
mode means that the Python user-defined functions will be executed in separate
Python processes.
In release-1.15, it has introduced a new execution mode called `THREAD`
execution mode. The `THREAD`
-mode means that the Python user-defined functions will be executed in the same
process as Java Operator,
-It should be noted that multiple Python user-defined functions running in the
same JVM are still affected by GIL.
+mode means that the Python user-defined functions will be executed in JVM.
+
+**NOTE:** Multiple Python user-defined functions running in the same JVM are
still affected by GIL.
## When can/should I use THREAD execution mode?
The purpose of the introduction of `THREAD` mode is to overcome the overhead
of serialization/deserialization
-and network communication caused in `PROCESS` mode. So if performance is not
your concern, or the computing
-logic of your customized Python functions is the performance bottleneck of the
job, `PROCESS` mode will
-be the best choice as `PROCESS` mode provides the best isolation compared to
`THREAD` mode.
+and network communication introduced of inter-process communication in the
`PROCESS` mode.
+So if performance is not your concern, or the computing logic of your Python
user-defined functions is the performance bottleneck of the job,
+`PROCESS` mode will be the best choice as `PROCESS` mode provides the best
isolation compared to `THREAD` mode.
## Configuring Python execution mode
@@ -48,18 +49,73 @@ The execution mode can be configured via the
`python.execution-mode` setting.
There are two possible values:
- `PROCESS`: The Python user-defined functions will be executed in separate
Python process. (default)
- - `THREAD`: The Python user-defined functions will be executed in the same
process as Java operator.
+ - `THREAD`: The Python user-defined functions will be executed in JVM.
-You could specify the Python execution mode using Python Table API as
following:
+You could specify the execution mode in Python Table API or Python DataStream
API jobs as following:
```python
+## Python Table API
# Specify `PROCESS` mode
table_env.get_config().set("python.execution-mode", "process")
# Specify `THREAD` mode
table_env.get_config().set("python.execution-mode", "thread")
+
+
+## Python DataStream API
+
+config = Configuration()
+
+# Specify `PROCESS` mode
+config.set_string("python.execution-mode", "process")
+
+# Specify `THREAD` mode
+config.set_string("python.execution-mode", "thread")
+
+# Create the corresponding StreamExecutionEnvironment
+env = StreamExecutionEnvironment.get_execution_environment(config)
```
+## Supported Cases
+
+### Python Table API
+
+The following table shows where the `THREAD` execution mode is supported in
Python Table API.
+
+| UDFs | `PROCESS` | `THREAD`|
+|-----|-----------|---------|
+| Python UDF | Yes | Yes |
+| Python UDTF | Yes | Yes |
+| Python UDAF | Yes | No |
+| Pandas UDF & Pandas UDAF | Yes | No |
+
+### Python DataStream API
+
+The following table shows where the `PROCESS` execution mode and the `THREAD`
execution mode are supported in Python DataStream API.
+
+| Operators | `PROCESS` | `THREAD` |
+|-----------|-----------|----------|
+| Map | Yes | Yes |
+| FlatMap | Yes | Yes |
+| Filter | Yes | Yes |
+| Reduce | Yes | Yes |
+| Union | Yes | Yes |
+| Connect | Yes | Yes |
+| CoMap | Yes | Yes |
+| CoFlatMap | Yes | Yes |
+| Process Function | Yes | Yes |
+| Window Apply | Yes | Yes |
+| Window Aggregate | Yes | Yes |
+| Window Reduce | Yes | Yes |
+| Window Process | Yes | Yes |
+| Side Output | Yes | Yes |
+| State | Yes | Yes |
+| Iterate | No | No |
+| Window CoGroup | No | No |
+| Window Join | No | No |
+| Interval Join | No | No |
+| Async I/O | No | No |
+
{{< hint info >}}
Currently, it still doesn't support to execute Python UDFs in `THREAD`
execution mode in all places.
It will fall back to `PROCESS` execution mode in these cases. So it may happen
that you configure a job