This is an automated email from the ASF dual-hosted git repository.
dianfu pushed a commit to branch release-1.12
in repository https://gitbox.apache.org/repos/asf/flink.git
The following commit(s) were added to refs/heads/release-1.12 by this push:
new 2c68c63 [FLINK-20601][docs][python] Rework PyFlink CLI documentation.
2c68c63 is described below
commit 2c68c63f97a58805cebdbeba5563d1c851a3bcb6
Author: acqua.csq <[email protected]>
AuthorDate: Tue Dec 15 15:15:58 2020 +0800
[FLINK-20601][docs][python] Rework PyFlink CLI documentation.
This closes #14367.
---
docs/deployment/cli.md | 72 +++++++++++++++++++++++++++++++
docs/deployment/cli.zh.md | 72 +++++++++++++++++++++++++++++++
docs/dev/python/datastream_tutorial.md | 2 +-
docs/dev/python/datastream_tutorial.zh.md | 2 +-
docs/dev/python/table_api_tutorial.md | 2 +-
docs/dev/python/table_api_tutorial.zh.md | 2 +-
6 files changed, 148 insertions(+), 4 deletions(-)
diff --git a/docs/deployment/cli.md b/docs/deployment/cli.md
index adcbdd1..6984f12 100644
--- a/docs/deployment/cli.md
+++ b/docs/deployment/cli.md
@@ -338,4 +338,76 @@ specified in the `config/flink-config.yaml`.
For more details on the commands and the available options, please refer to
the Resource Provider-specific
pages of the documentation.
+### Submitting PyFlink Jobs
+
+Currently, users are able to submit a PyFlink job via the CLI. It does not
require to specify the
+JAR file path or the entry main class, which is different from the Java job
submission.
+
+<span class="label label-info">Note</span> When submitting Python job via
`flink run`, Flink will run the command "python". Please run the following
command to confirm that the python executable in current environment points to
a supported Python version of 3.5+.
+{% highlight bash %}
+$ python --version
+# the version printed here must be 3.5+
+{% endhighlight %}
+
+The following commands show different PyFlink job submission use-cases:
+
+- Run a PyFlink job:
+{% highlight bash %}
+$ ./bin/flink run --python examples/python/table/batch/word_count.py
+{% endhighlight %}
+
+- Run a PyFlink job with additional source and resource files. Files specified
in `--pyFiles` will be
+added to the `PYTHONPATH` and, therefore, available in the Python code.
+{% highlight bash %}
+$ ./bin/flink run \
+ --python examples/python/table/batch/word_count.py \
+ --pyFiles file:///user.txt,hdfs:///$namenode_address/username.txt
+{% endhighlight %}
+
+- Run a PyFlink job which will reference Java UDF or external connectors. JAR
file specified in `--jarfile` will be uploaded
+to the cluster.
+{% highlight bash %}
+$ ./bin/flink run \
+ --python examples/python/table/batch/word_count.py \
+ --jarfile <jarFile>
+{% endhighlight %}
+
+- Run a PyFlink job with pyFiles and the main entry module specified in
`--pyModule`:
+{% highlight bash %}
+$ ./bin/flink run \
+ --pyModule batch.word_count \
+ --pyFiles examples/python/table/batch
+{% endhighlight %}
+
+- Submit a PyFlink job on a specific JobManager running on host
`<jobmanagerHost>` (adapt the command accordingly):
+{% highlight bash %}
+$ ./bin/flink run \
+ --jobmanager <jobmanagerHost>:8081 \
+ --python examples/python/table/batch/word_count.py
+{% endhighlight %}
+
+- Run a PyFlink job using a [YARN cluster in Per-Job Mode]({% link
deployment/resource-providers/yarn.md %}#per-job-cluster-mode):
+{% highlight bash %}
+$ ./bin/flink run \
+ --target yarn-per-job
+ --python examples/python/table/batch/word_count.py
+{% endhighlight %}
+
+- Run a PyFlink application on a native Kubernetes cluster having the cluster
ID `<ClusterId>`, it requires a docker image with PyFlink installed, please
refer to [Enabling PyFlink in docker]({% link
deployment/resource-providers/standalone/docker.md %}#enabling-python):
+{% highlight bash %}
+$ ./bin/flink run-application \
+ --target kubernetes-application \
+ --parallelism 8 \
+ -Dkubernetes.cluster-id=<ClusterId> \
+ -Dtaskmanager.memory.process.size=4096m \
+ -Dkubernetes.taskmanager.cpu=2 \
+ -Dtaskmanager.numberOfTaskSlots=4 \
+ -Dkubernetes.container.image=<PyFlinkImageName> \
+ --pyModule word_count \
+ --pyFiles /opt/flink/examples/python/table/batch/word_count.py
+{% endhighlight %}
+
+To learn more available options, please refer to [Kubernetes]({% link
deployment/resource-providers/native_kubernetes.md %})
+or [YARN]({% link deployment/resource-providers/yarn.md %}) which are
described in more detail in the
+Resource Provider section.
{% top %}
diff --git a/docs/deployment/cli.zh.md b/docs/deployment/cli.zh.md
index 45dd16d..a49e2eb 100644
--- a/docs/deployment/cli.zh.md
+++ b/docs/deployment/cli.zh.md
@@ -337,4 +337,76 @@ specified in the `config/flink-config.yaml`.
For more details on the commands and the available options, please refer to
the Resource Provider-specific
pages of the documentation.
+### Submitting PyFlink Jobs
+
+Currently, users are able to submit a PyFlink job via the CLI. It does not
require to specify the
+JAR file path or the entry main class, which is different from the Java job
submission.
+
+<span class="label label-info">Note</span> When submitting Python job via
`flink run`, Flink will run the command "python". Please run the following
command to confirm that the python executable in current environment points to
a supported Python version of 3.5+.
+{% highlight bash %}
+$ python --version
+# the version printed here must be 3.5+
+{% endhighlight %}
+
+The following commands show different PyFlink job submission use-cases:
+
+- Run a PyFlink job:
+{% highlight bash %}
+$ ./bin/flink run --python examples/python/table/batch/word_count.py
+{% endhighlight %}
+
+- Run a PyFlink job with additional source and resource files. Files specified
in `--pyFiles` will be
+added to the `PYTHONPATH` and, therefore, available in the Python code.
+{% highlight bash %}
+$ ./bin/flink run \
+ --python examples/python/table/batch/word_count.py \
+ --pyFiles file:///user.txt,hdfs:///$namenode_address/username.txt
+{% endhighlight %}
+
+- Run a PyFlink job which will reference Java UDF or external connectors. JAR
file specified in `--jarfile` will be uploaded
+to the cluster.:
+{% highlight bash %}
+$ ./bin/flink run \
+ --python examples/python/table/batch/word_count.py \
+ --jarfile <jarFile>
+{% endhighlight %}
+
+- Run a PyFlink job with pyFiles and the main entry module specified in
`--pyModule`:
+{% highlight bash %}
+$ ./bin/flink run \
+ --pyModule batch.word_count \
+ --pyFiles examples/python/table/batch
+{% endhighlight %}
+
+- Submit a PyFlink job on a specific JobManager running on host
`<jobmanagerHost>` (adapt the command accordingly):
+{% highlight bash %}
+$ ./bin/flink run \
+ --jobmanager <jobmanagerHost>:8081 \
+ --python examples/python/table/batch/word_count.py
+{% endhighlight %}
+
+- Run a PyFlink job using a [YARN cluster in Per-Job Mode]({% link
deployment/resource-providers/yarn.zh.md %}#per-job-cluster-mode):
+{% highlight bash %}
+$ ./bin/flink run \
+ --target yarn-per-job
+ --python examples/python/table/batch/word_count.py
+{% endhighlight %}
+
+- Run a PyFlink application on a native Kubernetes cluster having the cluster
ID `<ClusterId>`, it requires a docker image with PyFlink installed, please
refer to [Enabling PyFlink in docker]({% link
deployment/resource-providers/standalone/docker.zh.md %}#enabling-python):
+{% highlight bash %}
+$ ./bin/flink run-application \
+ --target kubernetes-application \
+ --parallelism 8 \
+ -Dkubernetes.cluster-id=<ClusterId> \
+ -Dtaskmanager.memory.process.size=4096m \
+ -Dkubernetes.taskmanager.cpu=2 \
+ -Dtaskmanager.numberOfTaskSlots=4 \
+ -Dkubernetes.container.image=<PyFlinkImageName> \
+ --pyModule word_count \
+ --pyFiles /opt/flink/examples/python/table/batch/word_count.py
+{% endhighlight %}
+
+To learn more available options, please refer to [Kubernetes]({% link
deployment/resource-providers/native_kubernetes.zh.md %})
+or [YARN]({% link deployment/resource-providers/yarn.zh.md %}) which are
described in more detail in the
+Resource Provider section.
{% top %}
diff --git a/docs/dev/python/datastream_tutorial.md
b/docs/dev/python/datastream_tutorial.md
index 944af44..a2b86e41 100644
--- a/docs/dev/python/datastream_tutorial.md
+++ b/docs/dev/python/datastream_tutorial.md
@@ -130,7 +130,7 @@ Next, you can run the example you just created on the
command line:
$ python datastream_tutorial.py
{% endhighlight %}
-The command builds and runs your PyFlink program in a local mini cluster. You
can alternatively submit it to a remote cluster using the instructions detailed
in [Job Submission Examples]({% link deployment/cli.md
%}#job-submission-examples).
+The command builds and runs your PyFlink program in a local mini cluster. You
can alternatively submit it to a remote cluster using the instructions detailed
in [Job Submission Examples]({% link deployment/cli.md
%}#submitting-pyflink-jobs).
Finally, you can see the execution result on the command line:
diff --git a/docs/dev/python/datastream_tutorial.zh.md
b/docs/dev/python/datastream_tutorial.zh.md
index e121ba0..e491ef6 100644
--- a/docs/dev/python/datastream_tutorial.zh.md
+++ b/docs/dev/python/datastream_tutorial.zh.md
@@ -131,7 +131,7 @@ rm -rf /tmp/output
$ python datastream_tutorial.py
{% endhighlight %}
-这个命令会在本地集群中构建并运行 PyFlink 程序。你也可以使用 [Job Submission Examples]({% link
deployment/cli.zh.md %}#job-submission-examples) 中描述的命令将其提交到远程集群。
+这个命令会在本地集群中构建并运行 PyFlink 程序。你也可以使用 [Job Submission Examples]({% link
deployment/cli.zh.md %}#submitting-pyflink-jobs) 中描述的命令将其提交到远程集群。
最后,你可以在命令行上看到执行结果:
diff --git a/docs/dev/python/table_api_tutorial.md
b/docs/dev/python/table_api_tutorial.md
index 9a7ab5e..e17fc4e 100644
--- a/docs/dev/python/table_api_tutorial.md
+++ b/docs/dev/python/table_api_tutorial.md
@@ -187,7 +187,7 @@ $ python WordCount.py
The command builds and runs the Python Table API program in a local mini
cluster.
You can also submit the Python Table API program to a remote cluster, you can
refer
-[Job Submission Examples]({% link deployment/cli.md %}#job-submission-examples)
+[Job Submission Examples]({% link deployment/cli.md %}#submitting-pyflink-jobs)
for more details.
Finally, you can see the execution result on the command line:
diff --git a/docs/dev/python/table_api_tutorial.zh.md
b/docs/dev/python/table_api_tutorial.zh.md
index f210e08..8af0eae 100644
--- a/docs/dev/python/table_api_tutorial.zh.md
+++ b/docs/dev/python/table_api_tutorial.zh.md
@@ -191,7 +191,7 @@ $ python WordCount.py
{% endhighlight %}
上述命令会构建Python Table API程序,并在本地mini cluster中运行。如果想将作业提交到远端集群执行,
-可以参考[作业提交示例]({% link deployment/cli.zh.md %}#job-submission-examples)。
+可以参考[作业提交示例]({% link deployment/cli.zh.md %}#submitting-pyflink-jobs)。
最后,你可以通过如下命令查看你的运行结果: