dianfu commented on a change in pull request #19150:
URL: https://github.com/apache/flink/pull/19150#discussion_r830718361
##########
File path: docs/content.zh/docs/dev/python/python_execution_mode.md
##########
@@ -31,61 +31,48 @@ defines how to execute your customized Python functions.
Prior to release-1.15, there is the only execution mode called `PROCESS`
execution mode. The `PROCESS`
mode means that the Python user-defined functions will be executed in separate
Python processes.
-In release-1.15, it has introduced another two execution modes called
`MULTI-THREAD` execution mode and
-`SUB-INTERPRETER` execution mode. The `MULTI-THREAD` mode means that the
Python user-defined functions
-will be executed in the same thread as Java Operator, but it will be affected
by GIL performance.
-The `SUB-INTERPRETER` mode means that the Python user-defined functions will
be executed in Python
-different sub-interpreters rather than different threads of one interpreter,
which can largely overcome
-the effects of the GIL, but some CPython extensions libraries doesn't support
it, such as numpy, tensorflow, etc.
+In release-1.15, it has introduced a new execution mode called `THREAD`
execution mode. The `THREAD`
+mode means that the Python user-defined functions will be executed in the same
thread as Java Operator,
+but it will be affected by GIL performance.
Review comment:
```suggestion
It should be noted that multiple Python user-defined functions running in
the same JVM are still affected by GIL.
```
##########
File path: docs/content.zh/docs/dev/python/python_execution_mode.md
##########
@@ -31,61 +31,48 @@ defines how to execute your customized Python functions.
Prior to release-1.15, there is the only execution mode called `PROCESS`
execution mode. The `PROCESS`
mode means that the Python user-defined functions will be executed in separate
Python processes.
-In release-1.15, it has introduced another two execution modes called
`MULTI-THREAD` execution mode and
-`SUB-INTERPRETER` execution mode. The `MULTI-THREAD` mode means that the
Python user-defined functions
-will be executed in the same thread as Java Operator, but it will be affected
by GIL performance.
-The `SUB-INTERPRETER` mode means that the Python user-defined functions will
be executed in Python
-different sub-interpreters rather than different threads of one interpreter,
which can largely overcome
-the effects of the GIL, but some CPython extensions libraries doesn't support
it, such as numpy, tensorflow, etc.
+In release-1.15, it has introduced a new execution mode called `THREAD`
execution mode. The `THREAD`
+mode means that the Python user-defined functions will be executed in the same
thread as Java Operator,
+but it will be affected by GIL performance.
-## When can/should I use MULTI-THREAD execution mode or SUB-INTERPRETER
execution mode?
+## When can/should I use THREAD execution mode?
-The purpose of the introduction of `MULTI-THREAD` mode and `SUB-INTERPRETER`
mode is to overcome the
-overhead of serialization/deserialization and network communication caused in
`PROCESS` mode.
-So if performance is not your concern, or the computing logic of your
customized Python functions is
-the performance bottleneck of the job, `PROCESS` mode will be the best choice
as `PROCESS` mode provides
-the best isolation compared to `MULTI-THREAD` mode and `SUB-INTERPRETER` mode.
-
-Compared to `MULTI-THREAD` execution mode, `SUB-INTERPRETER` execution mode
can largely overcome the
-effects of the GIL, so you can get better performance usually. However,
`SUB-INTERPRETER` may fail in some CPython
-extensions libraries, such as numpy, tensorflow. In this case, you should use
`PROCESS` mode or `MULTI-THREAD` mode.
+The purpose of the introduction of `THREAD` mode is to overcome the overhead
of serialization/deserialization
+and network communication caused in `PROCESS` mode. So if performance is not
your concern, or the computing
+logic of your customized Python functions is the performance bottleneck of the
job, `PROCESS` mode will
+be the best choice as `PROCESS` mode provides the best isolation compared to
`THREAD` mode.
## Configuring Python execution mode
The execution mode can be configured via the `python.execution-mode` setting.
-There are three possible values:
+There are two possible values:
- `PROCESS`: The Python user-defined functions will be executed in separate
Python process. (default)
- - `MULTI-THREAD`: The Python user-defined functions will be executed in the
same thread as Java Operator.
- - `SUB-INTERPRETER`: The Python user-defined functions will be executed in
Python different sub-interpreters.
+ - `THREAD`: The Python user-defined functions will be executed in the same
thread as Java Operator.
Review comment:
```suggestion
- `THREAD`: The Python user-defined functions will be executed in the same
process as the Java operator.
```
##########
File path: flink-python/src/main/java/org/apache/flink/python/PythonOptions.java
##########
@@ -231,10 +231,8 @@
.stringType()
.defaultValue("process")
.withDescription(
- "Specify the python runtime execution mode. The
optional values are `process`, `multi-thread` and `sub-interpreter`. "
+ "Specify the python runtime execution mode. The
optional values are `process` and `thread`. "
+ "The `process` mode means that the
Python user-defined functions will be executed in separate Python process. "
- + "The `multi-thread` mode means that the
Python user-defined functions will be executed in the same thread as Java
Operator, but it will be affected by GIL performance. "
- + "The `sub-interpreter` mode means that
the Python user-defined functions will be executed in python different
sub-interpreters rather than different threads of one interpreter, "
- + "which can largely overcome the effects
of the GIL, but it maybe fail in some CPython extensions libraries, such as
numpy, tensorflow. "
- + "Note that if the python operator dose
not support `multi-thread` and `sub-interpreter` mode, we will still use
`process` mode.");
+ + "The `thread` mode means that the Python
user-defined functions will be executed in the same thread as Java Operator,
but it will be affected by GIL performance. "
Review comment:
```suggestion
+ "The `thread` mode means that the
Python user-defined functions will be executed in the same process of the Java
Operator. "
```
##########
File path: docs/content.zh/docs/dev/python/python_execution_mode.md
##########
@@ -31,61 +31,48 @@ defines how to execute your customized Python functions.
Prior to release-1.15, there is the only execution mode called `PROCESS`
execution mode. The `PROCESS`
mode means that the Python user-defined functions will be executed in separate
Python processes.
-In release-1.15, it has introduced another two execution modes called
`MULTI-THREAD` execution mode and
-`SUB-INTERPRETER` execution mode. The `MULTI-THREAD` mode means that the
Python user-defined functions
-will be executed in the same thread as Java Operator, but it will be affected
by GIL performance.
-The `SUB-INTERPRETER` mode means that the Python user-defined functions will
be executed in Python
-different sub-interpreters rather than different threads of one interpreter,
which can largely overcome
-the effects of the GIL, but some CPython extensions libraries doesn't support
it, such as numpy, tensorflow, etc.
+In release-1.15, it has introduced a new execution mode called `THREAD`
execution mode. The `THREAD`
+mode means that the Python user-defined functions will be executed in the same
thread as Java Operator,
+but it will be affected by GIL performance.
-## When can/should I use MULTI-THREAD execution mode or SUB-INTERPRETER
execution mode?
+## When can/should I use THREAD execution mode?
-The purpose of the introduction of `MULTI-THREAD` mode and `SUB-INTERPRETER`
mode is to overcome the
-overhead of serialization/deserialization and network communication caused in
`PROCESS` mode.
-So if performance is not your concern, or the computing logic of your
customized Python functions is
-the performance bottleneck of the job, `PROCESS` mode will be the best choice
as `PROCESS` mode provides
-the best isolation compared to `MULTI-THREAD` mode and `SUB-INTERPRETER` mode.
-
-Compared to `MULTI-THREAD` execution mode, `SUB-INTERPRETER` execution mode
can largely overcome the
-effects of the GIL, so you can get better performance usually. However,
`SUB-INTERPRETER` may fail in some CPython
-extensions libraries, such as numpy, tensorflow. In this case, you should use
`PROCESS` mode or `MULTI-THREAD` mode.
+The purpose of the introduction of `THREAD` mode is to overcome the overhead
of serialization/deserialization
+and network communication caused in `PROCESS` mode. So if performance is not
your concern, or the computing
+logic of your customized Python functions is the performance bottleneck of the
job, `PROCESS` mode will
+be the best choice as `PROCESS` mode provides the best isolation compared to
`THREAD` mode.
## Configuring Python execution mode
The execution mode can be configured via the `python.execution-mode` setting.
-There are three possible values:
+There are two possible values:
- `PROCESS`: The Python user-defined functions will be executed in separate
Python process. (default)
- - `MULTI-THREAD`: The Python user-defined functions will be executed in the
same thread as Java Operator.
- - `SUB-INTERPRETER`: The Python user-defined functions will be executed in
Python different sub-interpreters.
+ - `THREAD`: The Python user-defined functions will be executed in the same
thread as Java Operator.
You could specify the Python execution mode using Python Table API as
following:
```python
# Specify `PROCESS` mode
table_env.get_config().get_configuration().set_string("python.execution-mode",
"process")
-# Specify `MULTI-THREAD` mode
-table_env.get_config().get_configuration().set_string("python.execution-mode",
"multi-thread")
-
-# Specify `SUB-INTERPRETER` mode
-table_env.get_config().get_configuration().set_string("python.execution-mode",
"sub-interpreter")
+# Specify `THREAD` mode
+table_env.get_config().get_configuration().set_string("python.execution-mode",
"thread")
```
{{< hint info >}}
-Currently, it still doesn't support to execute Python UDFs in `MULTI-THREAD`
and `SUB-INTERPRETER` execution mode
-in all places. It will fall back to `PROCESS` execution mode in these cases.
So it may happen that you configure a job
-to execute in `MULTI-THREAD` or `SUB-INTERPRETER` execution modes, however,
it's actually executed in `PROCESS` execution mode.
+Currently, it still doesn't support to execute Python UDFs in `THREAD`
execution mode in all places.
+It will fall back to `PROCESS` execution mode in these cases. So it may happen
that you configure a job
+to execute in `THREAD` execution modes, however, it's actually executed in
`PROCESS` execution mode.
{{< /hint >}}
{{< hint info >}}
-`MULTI-THREAD` execution mode only supports Python 3.7+. `SUB-INTERPRETER`
execution mode only supports Python 3.8+.
+`THREAD` execution mode only supports Python 3.7+.
Review comment:
```suggestion
`THREAD` execution mode is only supported in Python 3.7+.
```
##########
File path: flink-python/src/main/java/org/apache/flink/python/PythonOptions.java
##########
@@ -231,10 +231,8 @@
.stringType()
.defaultValue("process")
.withDescription(
- "Specify the python runtime execution mode. The
optional values are `process`, `multi-thread` and `sub-interpreter`. "
+ "Specify the python runtime execution mode. The
optional values are `process` and `thread`. "
+ "The `process` mode means that the
Python user-defined functions will be executed in separate Python process. "
- + "The `multi-thread` mode means that the
Python user-defined functions will be executed in the same thread as Java
Operator, but it will be affected by GIL performance. "
- + "The `sub-interpreter` mode means that
the Python user-defined functions will be executed in python different
sub-interpreters rather than different threads of one interpreter, "
- + "which can largely overcome the effects
of the GIL, but it maybe fail in some CPython extensions libraries, such as
numpy, tensorflow. "
- + "Note that if the python operator dose
not support `multi-thread` and `sub-interpreter` mode, we will still use
`process` mode.");
+ + "The `thread` mode means that the Python
user-defined functions will be executed in the same thread as Java Operator,
but it will be affected by GIL performance. "
+ + "Note that if the python operator dose
not support `thread` mode, we will still use `process` mode.");
Review comment:
```suggestion
+ "Note that currently it still doesn't
support to execute Python user-defined functions in `thread` mode in all
places. It will fall back to `process` mode in these cases. ");
```
##########
File path: docs/content.zh/docs/dev/python/python_execution_mode.md
##########
@@ -31,61 +31,48 @@ defines how to execute your customized Python functions.
Prior to release-1.15, there is the only execution mode called `PROCESS`
execution mode. The `PROCESS`
mode means that the Python user-defined functions will be executed in separate
Python processes.
-In release-1.15, it has introduced another two execution modes called
`MULTI-THREAD` execution mode and
-`SUB-INTERPRETER` execution mode. The `MULTI-THREAD` mode means that the
Python user-defined functions
-will be executed in the same thread as Java Operator, but it will be affected
by GIL performance.
-The `SUB-INTERPRETER` mode means that the Python user-defined functions will
be executed in Python
-different sub-interpreters rather than different threads of one interpreter,
which can largely overcome
-the effects of the GIL, but some CPython extensions libraries doesn't support
it, such as numpy, tensorflow, etc.
+In release-1.15, it has introduced a new execution mode called `THREAD`
execution mode. The `THREAD`
+mode means that the Python user-defined functions will be executed in the same
thread as Java Operator,
+but it will be affected by GIL performance.
-## When can/should I use MULTI-THREAD execution mode or SUB-INTERPRETER
execution mode?
+## When can/should I use THREAD execution mode?
-The purpose of the introduction of `MULTI-THREAD` mode and `SUB-INTERPRETER`
mode is to overcome the
-overhead of serialization/deserialization and network communication caused in
`PROCESS` mode.
-So if performance is not your concern, or the computing logic of your
customized Python functions is
-the performance bottleneck of the job, `PROCESS` mode will be the best choice
as `PROCESS` mode provides
-the best isolation compared to `MULTI-THREAD` mode and `SUB-INTERPRETER` mode.
-
-Compared to `MULTI-THREAD` execution mode, `SUB-INTERPRETER` execution mode
can largely overcome the
-effects of the GIL, so you can get better performance usually. However,
`SUB-INTERPRETER` may fail in some CPython
-extensions libraries, such as numpy, tensorflow. In this case, you should use
`PROCESS` mode or `MULTI-THREAD` mode.
+The purpose of the introduction of `THREAD` mode is to overcome the overhead
of serialization/deserialization
+and network communication caused in `PROCESS` mode. So if performance is not
your concern, or the computing
+logic of your customized Python functions is the performance bottleneck of the
job, `PROCESS` mode will
+be the best choice as `PROCESS` mode provides the best isolation compared to
`THREAD` mode.
## Configuring Python execution mode
The execution mode can be configured via the `python.execution-mode` setting.
-There are three possible values:
+There are two possible values:
- `PROCESS`: The Python user-defined functions will be executed in separate
Python process. (default)
- - `MULTI-THREAD`: The Python user-defined functions will be executed in the
same thread as Java Operator.
- - `SUB-INTERPRETER`: The Python user-defined functions will be executed in
Python different sub-interpreters.
+ - `THREAD`: The Python user-defined functions will be executed in the same
thread as Java Operator.
You could specify the Python execution mode using Python Table API as
following:
```python
# Specify `PROCESS` mode
table_env.get_config().get_configuration().set_string("python.execution-mode",
"process")
-# Specify `MULTI-THREAD` mode
-table_env.get_config().get_configuration().set_string("python.execution-mode",
"multi-thread")
-
-# Specify `SUB-INTERPRETER` mode
-table_env.get_config().get_configuration().set_string("python.execution-mode",
"sub-interpreter")
+# Specify `THREAD` mode
+table_env.get_config().get_configuration().set_string("python.execution-mode",
"thread")
```
{{< hint info >}}
-Currently, it still doesn't support to execute Python UDFs in `MULTI-THREAD`
and `SUB-INTERPRETER` execution mode
-in all places. It will fall back to `PROCESS` execution mode in these cases.
So it may happen that you configure a job
-to execute in `MULTI-THREAD` or `SUB-INTERPRETER` execution modes, however,
it's actually executed in `PROCESS` execution mode.
+Currently, it still doesn't support to execute Python UDFs in `THREAD`
execution mode in all places.
+It will fall back to `PROCESS` execution mode in these cases. So it may happen
that you configure a job
+to execute in `THREAD` execution modes, however, it's actually executed in
`PROCESS` execution mode.
Review comment:
```suggestion
to execute in `THREAD` execution mode, however, it's actually executed in
`PROCESS` execution mode.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]