morsapaes commented on a change in pull request #14041:
URL: https://github.com/apache/flink/pull/14041#discussion_r521871549
##########
File path: docs/dev/python/table-api-users-guide/conversion_of_pandas.md
##########
@@ -60,11 +61,11 @@ table = t_env.from_pandas(pdf,
## Convert PyFlink Table to Pandas DataFrame
-It also supports converting a PyFlink Table to a Pandas DataFrame. Internally,
it will materialize the results of the
-table and serialize them into multiple Arrow batches of Arrow columnar format
at client side. The maximum Arrow batch size
-is determined by the config option [python.fn-execution.arrow.batch.size]({%
link dev/python/table-api-users-guide/python_config.md
%}#python-fn-execution-arrow-batch-size).
-The serialized data will then be converted to Pandas DataFrame. It will
collect the content of the table to
-the client side and so please make sure that the content of the table could
fit in memory before calling this method.
+PyFlink Tables can additionally be converted into a Pandas DataFrame.
+The resulting rows will materialized into multiple Arrow batches of Arrow
columnar format on the client.
+The maximum Arrow batch size is configured via the option
[python.fn-execution.arrow.batch.size]({% link
dev/python/table-api-users-guide/python_config.md
%}#python-fn-execution-arrow-batch-size).
+The serialized data will then be converted to a Pandas DataFrame.
+Because the contents of the table will be collected on the client, please
ensure that the results of the table can fit in memory before calling this
method.
Review comment:
```suggestion
PyFlink Tables can additionally be converted into a Pandas DataFrame.
The resulting rows will be serialized as multiple Arrow batches of Arrow
columnar format on the client.
The maximum Arrow batch size is configured via the option
[python.fn-execution.arrow.batch.size]({% link
dev/python/table-api-users-guide/python_config.md
%}#python-fn-execution-arrow-batch-size).
The serialized data will then be converted to a Pandas DataFrame.
Because the contents of the table will be collected on the client, please
ensure that the results can fit in memory before calling this method.
```
##########
File path: docs/dev/python/table-api-users-guide/conversion_of_pandas.md
##########
@@ -22,17 +22,18 @@ specific language governing permissions and limitations
under the License.
-->
-It supports to convert between PyFlink Table and Pandas DataFrame.
+PyFlink Table API supports conversion to and from Pandas DataFrame.
* This will be replaced by the TOC
{:toc}
## Convert Pandas DataFrame to PyFlink Table
-It supports creating a PyFlink Table from a Pandas DataFrame. Internally, it
will serialize the Pandas DataFrame
-using Arrow columnar format at client side and the serialized data will be
processed and deserialized in Arrow source
-during execution. The Arrow source could also be used in streaming jobs and it
will properly handle the checkpoint
-and provides the exactly once guarantees.
+Pandas DataFrames can be converted into a PyFlink TAble.
+Internally, PyFlink will serialize the Pandas DataFrame using Arrow columnar
format on the client.
+The serialized data will be processed and deserialized in Arrow source during
execution.
+The Arrow source can also be used in streaming jobs, and is integrated with
checkpointing to
+and provide the exactly once guarantees.
Review comment:
```suggestion
provide the exactly once guarantees.
```
##########
File path: docs/dev/python/table-api-users-guide/conversion_of_pandas.md
##########
@@ -22,17 +22,18 @@ specific language governing permissions and limitations
under the License.
-->
-It supports to convert between PyFlink Table and Pandas DataFrame.
+PyFlink Table API supports conversion to and from Pandas DataFrame.
* This will be replaced by the TOC
{:toc}
## Convert Pandas DataFrame to PyFlink Table
-It supports creating a PyFlink Table from a Pandas DataFrame. Internally, it
will serialize the Pandas DataFrame
-using Arrow columnar format at client side and the serialized data will be
processed and deserialized in Arrow source
-during execution. The Arrow source could also be used in streaming jobs and it
will properly handle the checkpoint
-and provides the exactly once guarantees.
+Pandas DataFrames can be converted into a PyFlink TAble.
Review comment:
```suggestion
Pandas DataFrames can be converted into a PyFlink Table.
```
##########
File path: docs/dev/python/table-api-users-guide/index.md
##########
@@ -24,7 +24,6 @@ under the License.
-->
Python Table API allows users to develop [Table API]({% link
dev/table/tableApi.md %}) programs using the Python language.
Review comment:
```suggestion
The Python Table API allows users to develop [Table API]({% link
dev/table/tableApi.md %}) programs using the Python language.
```
##########
File path: docs/dev/python/index.md
##########
@@ -22,3 +23,43 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+<img src="{% link /fig/pyflink.svg %}" alt="PyFlink" class="offset"
width="50%" />
+
+PyFlink is a language for building unified batch and streaming workloads.
+This means real-time streaming pipelines, performing exploratory data
+analysis at scale, building machine learning pipelines, and creating ETLs for
a data platform.
+If you're already familiar with Python and libraries such as Pandas, then
PyFlink makes it simple
+to leverage the full capabilities of the Apache Flink ecosystem.
+
+The PyFlink Table API makes it simple to write powerful relational queries for
building reports and
+ETL pipelines.
+At the same time, the PyFlink DataStream API gives developers access to
low-level control over
+state and time, unlocking the full power of stream processing.
Review comment:
```suggestion
* The **PyFlink Table API** allows you to write powerful relational queries
in a way that is similar to using SQL or working with tabular data in Python.
* At the same time, the **PyFlink DataStream API** gives you lower-level
control over the core building blocks of Flink
([state](https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/stateful-stream-processing.html#what-is-state)
and
[time](https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/timely-stream-processing.html#introduction))
to build more complex stream processing use cases.
```
##########
File path: docs/dev/python/table-api-users-guide/index.zh.md
##########
@@ -25,8 +25,6 @@ under the License.
Python Table API允许用户使用Python语言开发[Table API]({% link dev/table/tableApi.zh.md
%})程序。
Review comment:
```suggestion
The Python Table API允许用户使用Python语言开发[Table API]({% link
dev/table/tableApi.zh.md %})程序。
```
##########
File path: docs/dev/python/index.md
##########
@@ -22,3 +23,43 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+<img src="{% link /fig/pyflink.svg %}" alt="PyFlink" class="offset"
width="50%" />
+
+PyFlink is a language for building unified batch and streaming workloads.
+This means real-time streaming pipelines, performing exploratory data
+analysis at scale, building machine learning pipelines, and creating ETLs for
a data platform.
+If you're already familiar with Python and libraries such as Pandas, then
PyFlink makes it simple
+to leverage the full capabilities of the Apache Flink ecosystem.
Review comment:
```suggestion
If you're already familiar with Python and libraries such as Pandas, then
PyFlink makes it simpler to leverage the full capabilities of the Flink
ecosystem. Depending on the level of abstraction you need, there are two
different APIs that can be used in PyFlink:
```
##########
File path: docs/dev/python/index.md
##########
@@ -22,3 +23,43 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+<img src="{% link /fig/pyflink.svg %}" alt="PyFlink" class="offset"
width="50%" />
+
+PyFlink is a language for building unified batch and streaming workloads.
+This means real-time streaming pipelines, performing exploratory data
+analysis at scale, building machine learning pipelines, and creating ETLs for
a data platform.
+If you're already familiar with Python and libraries such as Pandas, then
PyFlink makes it simple
+to leverage the full capabilities of the Apache Flink ecosystem.
+
+The PyFlink Table API makes it simple to write powerful relational queries for
building reports and
+ETL pipelines.
+At the same time, the PyFlink DataStream API gives developers access to
low-level control over
+state and time, unlocking the full power of stream processing.
Review comment:
Think it's important here to give users something to identify with, like
SQL and Pandas. Also, I don't think state and time in the way that we think
about them are straightforward for Python users that don't know Flink.
##########
File path: docs/dev/python/index.md
##########
@@ -22,3 +23,43 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+<img src="{% link /fig/pyflink.svg %}" alt="PyFlink" class="offset"
width="50%" />
+
+PyFlink is a language for building unified batch and streaming workloads.
+This means real-time streaming pipelines, performing exploratory data
+analysis at scale, building machine learning pipelines, and creating ETLs for
a data platform.
Review comment:
```suggestion
PyFlink is a Python API for Apache Flink that allows you to build scalable
batch and streaming workloads, such as real-time data processing pipelines,
large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL
processes.
```
##########
File path: docs/dev/python/index.md
##########
@@ -22,3 +23,43 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+<img src="{% link /fig/pyflink.svg %}" alt="PyFlink" class="offset"
width="50%" />
+
+PyFlink is a language for building unified batch and streaming workloads.
+This means real-time streaming pipelines, performing exploratory data
+analysis at scale, building machine learning pipelines, and creating ETLs for
a data platform.
Review comment:
The way it's phrased makes it sound a bit strict on the use cases. Since
this is in the Flink official docs, I'd also prefer not to call it a
"language", but rather a Python API for Flink.
Will comment with a suggestion.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]