[GitHub] [flink-playgrounds] alpinegizmo commented on a change in pull request #16: [FLINK-19145][walkthroughs] Add PyFlink-walkthrough to Flink playground.

GitBox Tue, 15 Sep 2020 09:07:25 -0700


alpinegizmo commented on a change in pull request #16:
URL: https://github.com/apache/flink-playgrounds/pull/16#discussion_r488772985




##########
File path: pyflink-walkthrough/README.md
##########
@@ -0,0 +1,107 @@
+# pyflink-walkthrough
+
+## Background
+
+In this playground, you will learn how to build and run an end-to-end PyFlink 
pipeline for data analytics, covering the following steps:
+
+* Reading data from a Kafka source;
+* Creating data using a 
[UDF](https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/python/table-api-users-guide/udfs/python_udfs.html);
+* Performing a simple aggregation over the source data;
+* Writing the results to Elasticsearch and visualizing them in Kibana.
+
+### Kafka
+You will be using Kafka to store sample input data about payment transactions. 
A simple data generator 
[generate_source_data.py](generator/generate_source_data.py) is provided to
+continuously write new records to the `payment_msg` Kafka topic. Each record 
is structured as follows:
+ 
+`{"createTime": "2020-08-12 06:29:02", "orderId": 1597213797, "payAmount": 
28306.44976403719, "payPlatform": 0, "provinceId": 4}`
+
+* `createTime`: The creation time of the transaction. 
+* `orderId`: The id of the current transaction.
+* `payAmount`: The pay amount of the current transaction.

Review comment:
       ```suggestion
   * `payAmount`: The amount being paid with this transaction.
   ```

##########
File path: README.md
##########
@@ -15,6 +15,8 @@ Flink job. The playground is presented in detail in
 
 * The **Table Walkthrough** (in the `table-walkthrough` folder) shows to use 
the Table API to build an analytics pipeline that reads streaming data from 
Kafka and writes results to MySQL, along with a real-time dashboard in Grafana. 
The walkthrough is presented in detail in ["Real Time Reporting with the Table 
API"](https://ci.apache.org/projects/flink/flink-docs-release-1.11/try-flink/table_api.html),
 which is part of the _Try Flink_ section of the Flink documentation.
 
+* The **PyFlink Walkthrough** (int the `pyflink-walkthrough` folder) guides 
you to learn about how to manage and run PyFlink Jobs. The pipeline of this 
walkthrough reads data from Kafka, performs aggregations and writes results to 
Elasticsearch visualized via Kibana.

Review comment:
       ```suggestion
   * The **PyFlink Walkthrough** (in the `pyflink-walkthrough` folder) provides 
a complete example that uses the Python API, and guides you through the steps 
needed to run and manage PyFlink jobs. The pipeline used in this walkthrough 
reads data from Kafka, performs aggregations, and writes results to 
Elasticsearch that are visualized with Kibana.
   ```

##########
File path: pyflink-walkthrough/README.md
##########
@@ -0,0 +1,107 @@
+# pyflink-walkthrough
+
+## Background
+
+In this playground, you will learn how to build and run an end-to-end PyFlink 
pipeline for data analytics, covering the following steps:
+
+* Reading data from a Kafka source;
+* Creating data using a 
[UDF](https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/python/table-api-users-guide/udfs/python_udfs.html);
+* Performing a simple aggregation over the source data;
+* Writing the results to Elasticsearch and visualizing them in Kibana.
+
+### Kafka
+You will be using Kafka to store sample input data about payment transactions. 
A simple data generator 
[generate_source_data.py](generator/generate_source_data.py) is provided to
+continuously write new records to the `payment_msg` Kafka topic. Each record 
is structured as follows:
+ 
+`{"createTime": "2020-08-12 06:29:02", "orderId": 1597213797, "payAmount": 
28306.44976403719, "payPlatform": 0, "provinceId": 4}`
+
+* `createTime`: The creation time of the transaction. 
+* `orderId`: The id of the current transaction.
+* `payAmount`: The pay amount of the current transaction.
+* `payPlatform`: The platform used to pay the order, pc or mobile.
+* `provinceId`: The id of the province for the user. 
+
+You can use the following command to read data from the Kafka topic and check 
whether it's generated correctly:
+```shell script
+$ docker-compose exec kafka kafka-console-consumer.sh --bootstrap-server 
kafka:9092 --topic payment_msg
+{"createTime":"2020-07-27 
09:25:32.77","orderId":1595841867217,"payAmount":7732.44,"payPlatform":0,"provinceId":3}
+{"createTime":"2020-07-27 
09:25:33.231","orderId":1595841867218,"payAmount":75774.05,"payPlatform":0,"provinceId":3}
+{"createTime":"2020-07-27 
09:25:33.72","orderId":1595841867219,"payAmount":65908.55,"payPlatform":0,"provinceId":0}
+{"createTime":"2020-07-27 
09:25:34.216","orderId":1595841867220,"payAmount":15341.11,"payPlatform":0,"provinceId":1}
+{"createTime":"2020-07-27 
09:25:34.698","orderId":1595841867221,"payAmount":37504.42,"payPlatform":0,"provinceId":0}
+```
+
+### PyFlink
+
+The transaction data will be processed with PyFlink using the Python script 
[payment_msg_processing.py](payment_msg_proccessing.py).
+This script will first map the `provinceId` in the input records to its 
corresponding province name
+using a Python UDF, and them sum the transaction amount for each province 
using a group aggregate. 

Review comment:
       ```suggestion
   using a Python UDF, and then compute the sum of the transaction amounts for 
each province. 
   ```

##########
File path: pyflink-walkthrough/README.md
##########
@@ -0,0 +1,107 @@
+# pyflink-walkthrough
+
+## Background
+
+In this playground, you will learn how to build and run an end-to-end PyFlink 
pipeline for data analytics, covering the following steps:
+
+* Reading data from a Kafka source;
+* Creating data using a 
[UDF](https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/python/table-api-users-guide/udfs/python_udfs.html);
+* Performing a simple aggregation over the source data;
+* Writing the results to Elasticsearch and visualizing them in Kibana.
+
+### Kafka
+You will be using Kafka to store sample input data about payment transactions. 
A simple data generator 
[generate_source_data.py](generator/generate_source_data.py) is provided to
+continuously write new records to the `payment_msg` Kafka topic. Each record 
is structured as follows:
+ 
+`{"createTime": "2020-08-12 06:29:02", "orderId": 1597213797, "payAmount": 
28306.44976403719, "payPlatform": 0, "provinceId": 4}`
+
+* `createTime`: The creation time of the transaction. 
+* `orderId`: The id of the current transaction.
+* `payAmount`: The pay amount of the current transaction.
+* `payPlatform`: The platform used to pay the order, pc or mobile.
+* `provinceId`: The id of the province for the user. 
+
+You can use the following command to read data from the Kafka topic and check 
whether it's generated correctly:
+```shell script
+$ docker-compose exec kafka kafka-console-consumer.sh --bootstrap-server 
kafka:9092 --topic payment_msg
+{"createTime":"2020-07-27 
09:25:32.77","orderId":1595841867217,"payAmount":7732.44,"payPlatform":0,"provinceId":3}
+{"createTime":"2020-07-27 
09:25:33.231","orderId":1595841867218,"payAmount":75774.05,"payPlatform":0,"provinceId":3}
+{"createTime":"2020-07-27 
09:25:33.72","orderId":1595841867219,"payAmount":65908.55,"payPlatform":0,"provinceId":0}
+{"createTime":"2020-07-27 
09:25:34.216","orderId":1595841867220,"payAmount":15341.11,"payPlatform":0,"provinceId":1}
+{"createTime":"2020-07-27 
09:25:34.698","orderId":1595841867221,"payAmount":37504.42,"payPlatform":0,"provinceId":0}
+```
+
+### PyFlink
+
+The transaction data will be processed with PyFlink using the Python script 
[payment_msg_processing.py](payment_msg_proccessing.py).
+This script will first map the `provinceId` in the input records to its 
corresponding province name
+using a Python UDF, and them sum the transaction amount for each province 
using a group aggregate. 
+
+### ElasticSearch
+
+ElasticSearch is used to store upstream processing results and provide 
efficient query service.
+
+### Kibana
+
+Kibana is an open source data visualization dashboard for ElasticSearch. You 
will use it to visualize 
+the results of your PyFlink pipeline.
+
+## Setup
+
+As mentioned, the environment for this walkthrough is based on Docker Compose; 
and uses a custom image
+to spin up Flink (JobManager+TaskManager), Kafka+Zookeeper, the data generator 
and Elasticsearch+Kibana
+containers.
+
+Your can find the [docker-compose.yaml](docker-compose.yml) file of the 
pyflink-walkthrough is located in the `pyflink-walkthrough` root directory.
+
+### Building the Docker image
+
+First, build the Docker image by running:
+
+```bash
+docker-compose build
+```
+
+### Starting the Playground
+
+Once the Docker image build is complete, run the following command to start 
the playground:
+
+```bash
+docker-compose up -d
+```
+
+One way of checking if the playground was successfully started is accessing 
some of the services exposed:

Review comment:
       ```suggestion
   One way of checking if the playground was successfully started is to access 
some of the services that are exposed:
   ```

##########
File path: pyflink-walkthrough/README.md
##########
@@ -0,0 +1,107 @@
+# pyflink-walkthrough
+
+## Background
+
+In this playground, you will learn how to build and run an end-to-end PyFlink 
pipeline for data analytics, covering the following steps:
+
+* Reading data from a Kafka source;
+* Creating data using a 
[UDF](https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/python/table-api-users-guide/udfs/python_udfs.html);
+* Performing a simple aggregation over the source data;
+* Writing the results to Elasticsearch and visualizing them in Kibana.
+
+### Kafka
+You will be using Kafka to store sample input data about payment transactions. 
A simple data generator 
[generate_source_data.py](generator/generate_source_data.py) is provided to
+continuously write new records to the `payment_msg` Kafka topic. Each record 
is structured as follows:
+ 
+`{"createTime": "2020-08-12 06:29:02", "orderId": 1597213797, "payAmount": 
28306.44976403719, "payPlatform": 0, "provinceId": 4}`
+
+* `createTime`: The creation time of the transaction. 
+* `orderId`: The id of the current transaction.
+* `payAmount`: The pay amount of the current transaction.
+* `payPlatform`: The platform used to pay the order, pc or mobile.
+* `provinceId`: The id of the province for the user. 
+
+You can use the following command to read data from the Kafka topic and check 
whether it's generated correctly:
+```shell script
+$ docker-compose exec kafka kafka-console-consumer.sh --bootstrap-server 
kafka:9092 --topic payment_msg
+{"createTime":"2020-07-27 
09:25:32.77","orderId":1595841867217,"payAmount":7732.44,"payPlatform":0,"provinceId":3}
+{"createTime":"2020-07-27 
09:25:33.231","orderId":1595841867218,"payAmount":75774.05,"payPlatform":0,"provinceId":3}
+{"createTime":"2020-07-27 
09:25:33.72","orderId":1595841867219,"payAmount":65908.55,"payPlatform":0,"provinceId":0}
+{"createTime":"2020-07-27 
09:25:34.216","orderId":1595841867220,"payAmount":15341.11,"payPlatform":0,"provinceId":1}
+{"createTime":"2020-07-27 
09:25:34.698","orderId":1595841867221,"payAmount":37504.42,"payPlatform":0,"provinceId":0}
+```
+
+### PyFlink
+
+The transaction data will be processed with PyFlink using the Python script 
[payment_msg_processing.py](payment_msg_proccessing.py).
+This script will first map the `provinceId` in the input records to its 
corresponding province name
+using a Python UDF, and them sum the transaction amount for each province 
using a group aggregate. 
+
+### ElasticSearch
+
+ElasticSearch is used to store upstream processing results and provide 
efficient query service.

Review comment:
       ```suggestion
   ElasticSearch is used to store the results and to provide an efficient query 
service.
   ```

##########
File path: pyflink-walkthrough/README.md
##########
@@ -0,0 +1,107 @@
+# pyflink-walkthrough
+
+## Background
+
+In this playground, you will learn how to build and run an end-to-end PyFlink 
pipeline for data analytics, covering the following steps:
+
+* Reading data from a Kafka source;
+* Creating data using a 
[UDF](https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/python/table-api-users-guide/udfs/python_udfs.html);
+* Performing a simple aggregation over the source data;
+* Writing the results to Elasticsearch and visualizing them in Kibana.
+
+### Kafka
+You will be using Kafka to store sample input data about payment transactions. 
A simple data generator 
[generate_source_data.py](generator/generate_source_data.py) is provided to
+continuously write new records to the `payment_msg` Kafka topic. Each record 
is structured as follows:
+ 
+`{"createTime": "2020-08-12 06:29:02", "orderId": 1597213797, "payAmount": 
28306.44976403719, "payPlatform": 0, "provinceId": 4}`
+
+* `createTime`: The creation time of the transaction. 
+* `orderId`: The id of the current transaction.
+* `payAmount`: The pay amount of the current transaction.
+* `payPlatform`: The platform used to pay the order, pc or mobile.
+* `provinceId`: The id of the province for the user. 
+
+You can use the following command to read data from the Kafka topic and check 
whether it's generated correctly:
+```shell script
+$ docker-compose exec kafka kafka-console-consumer.sh --bootstrap-server 
kafka:9092 --topic payment_msg
+{"createTime":"2020-07-27 
09:25:32.77","orderId":1595841867217,"payAmount":7732.44,"payPlatform":0,"provinceId":3}
+{"createTime":"2020-07-27 
09:25:33.231","orderId":1595841867218,"payAmount":75774.05,"payPlatform":0,"provinceId":3}
+{"createTime":"2020-07-27 
09:25:33.72","orderId":1595841867219,"payAmount":65908.55,"payPlatform":0,"provinceId":0}
+{"createTime":"2020-07-27 
09:25:34.216","orderId":1595841867220,"payAmount":15341.11,"payPlatform":0,"provinceId":1}
+{"createTime":"2020-07-27 
09:25:34.698","orderId":1595841867221,"payAmount":37504.42,"payPlatform":0,"provinceId":0}
+```
+
+### PyFlink
+
+The transaction data will be processed with PyFlink using the Python script 
[payment_msg_processing.py](payment_msg_proccessing.py).
+This script will first map the `provinceId` in the input records to its 
corresponding province name
+using a Python UDF, and them sum the transaction amount for each province 
using a group aggregate. 
+
+### ElasticSearch
+
+ElasticSearch is used to store upstream processing results and provide 
efficient query service.
+
+### Kibana
+
+Kibana is an open source data visualization dashboard for ElasticSearch. You 
will use it to visualize 
+the results of your PyFlink pipeline.
+
+## Setup
+
+As mentioned, the environment for this walkthrough is based on Docker Compose; 
and uses a custom image
+to spin up Flink (JobManager+TaskManager), Kafka+Zookeeper, the data generator 
and Elasticsearch+Kibana
+containers.
+
+Your can find the [docker-compose.yaml](docker-compose.yml) file of the 
pyflink-walkthrough is located in the `pyflink-walkthrough` root directory.
+
+### Building the Docker image
+
+First, build the Docker image by running:
+
+```bash
+docker-compose build
+```
+
+### Starting the Playground
+
+Once the Docker image build is complete, run the following command to start 
the playground:
+
+```bash
+docker-compose up -d
+```
+
+One way of checking if the playground was successfully started is accessing 
some of the services exposed:
+
+1. visiting Flink Web UI [http://localhost:8081](http://localhost:8081).
+2. visiting Elasticsearch [http://localhost:9200](http://localhost:9200).
+3. visiting Kibana [http://localhost:5601](http://localhost:5601).
+
+**Note:** you may need to wait around 1 minute before all the services come up.
+
+### Stopping the Playground
+
+To stop the playground, run the following command:
+
+```bash
+docker-compose down
+```
+
+
+## Runing the PyFlink job
+
+1. Submit the PyFlink job.
+```shell script
+$ docker-compose exec jobmanager ./bin/flink run -py 
/opt/pyflink-walkthrough/payment_msg_proccessing.py -d
+```
+
+2. Navigate to the [Kibana UI](http://localhost:5601) and choose the 
pre-created dashboard `payment_dashboard`.
+
+![image](pic/dash_board.png)
+
+![image](pic/final.png)
+
+3. Stop the PyFlink job:
+
+Visit the Flink Web UI at 
[http://localhost:8081/#/overview](http://localhost:8081/#/overview) , select 
the job and click `Cancle` on the upper right side.

Review comment:
       ```suggestion
   Visit the Flink Web UI at 
[http://localhost:8081/#/overview](http://localhost:8081/#/overview) , select 
the job and click `Cancel` on the upper right side.
   ```

##########
File path: pyflink-walkthrough/README.md
##########
@@ -0,0 +1,107 @@
+# pyflink-walkthrough
+
+## Background
+
+In this playground, you will learn how to build and run an end-to-end PyFlink 
pipeline for data analytics, covering the following steps:
+
+* Reading data from a Kafka source;
+* Creating data using a 
[UDF](https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/python/table-api-users-guide/udfs/python_udfs.html);
+* Performing a simple aggregation over the source data;
+* Writing the results to Elasticsearch and visualizing them in Kibana.
+
+### Kafka
+You will be using Kafka to store sample input data about payment transactions. 
A simple data generator 
[generate_source_data.py](generator/generate_source_data.py) is provided to
+continuously write new records to the `payment_msg` Kafka topic. Each record 
is structured as follows:
+ 
+`{"createTime": "2020-08-12 06:29:02", "orderId": 1597213797, "payAmount": 
28306.44976403719, "payPlatform": 0, "provinceId": 4}`
+
+* `createTime`: The creation time of the transaction. 
+* `orderId`: The id of the current transaction.
+* `payAmount`: The pay amount of the current transaction.
+* `payPlatform`: The platform used to pay the order, pc or mobile.

Review comment:
       ```suggestion
   * `payPlatform`: The platform used to create this payment: pc or mobile.
   ```

##########
File path: pyflink-walkthrough/README.md
##########
@@ -0,0 +1,107 @@
+# pyflink-walkthrough
+
+## Background
+
+In this playground, you will learn how to build and run an end-to-end PyFlink 
pipeline for data analytics, covering the following steps:
+
+* Reading data from a Kafka source;
+* Creating data using a 
[UDF](https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/python/table-api-users-guide/udfs/python_udfs.html);
+* Performing a simple aggregation over the source data;
+* Writing the results to Elasticsearch and visualizing them in Kibana.
+
+### Kafka
+You will be using Kafka to store sample input data about payment transactions. 
A simple data generator 
[generate_source_data.py](generator/generate_source_data.py) is provided to
+continuously write new records to the `payment_msg` Kafka topic. Each record 
is structured as follows:
+ 
+`{"createTime": "2020-08-12 06:29:02", "orderId": 1597213797, "payAmount": 
28306.44976403719, "payPlatform": 0, "provinceId": 4}`
+
+* `createTime`: The creation time of the transaction. 
+* `orderId`: The id of the current transaction.
+* `payAmount`: The pay amount of the current transaction.
+* `payPlatform`: The platform used to pay the order, pc or mobile.
+* `provinceId`: The id of the province for the user. 
+
+You can use the following command to read data from the Kafka topic and check 
whether it's generated correctly:
+```shell script
+$ docker-compose exec kafka kafka-console-consumer.sh --bootstrap-server 
kafka:9092 --topic payment_msg
+{"createTime":"2020-07-27 
09:25:32.77","orderId":1595841867217,"payAmount":7732.44,"payPlatform":0,"provinceId":3}
+{"createTime":"2020-07-27 
09:25:33.231","orderId":1595841867218,"payAmount":75774.05,"payPlatform":0,"provinceId":3}
+{"createTime":"2020-07-27 
09:25:33.72","orderId":1595841867219,"payAmount":65908.55,"payPlatform":0,"provinceId":0}
+{"createTime":"2020-07-27 
09:25:34.216","orderId":1595841867220,"payAmount":15341.11,"payPlatform":0,"provinceId":1}
+{"createTime":"2020-07-27 
09:25:34.698","orderId":1595841867221,"payAmount":37504.42,"payPlatform":0,"provinceId":0}
+```
+
+### PyFlink
+
+The transaction data will be processed with PyFlink using the Python script 
[payment_msg_processing.py](payment_msg_proccessing.py).
+This script will first map the `provinceId` in the input records to its 
corresponding province name
+using a Python UDF, and them sum the transaction amount for each province 
using a group aggregate. 
+
+### ElasticSearch
+
+ElasticSearch is used to store upstream processing results and provide 
efficient query service.
+
+### Kibana
+
+Kibana is an open source data visualization dashboard for ElasticSearch. You 
will use it to visualize 
+the results of your PyFlink pipeline.
+
+## Setup
+
+As mentioned, the environment for this walkthrough is based on Docker Compose; 
and uses a custom image
+to spin up Flink (JobManager+TaskManager), Kafka+Zookeeper, the data generator 
and Elasticsearch+Kibana
+containers.
+
+Your can find the [docker-compose.yaml](docker-compose.yml) file of the 
pyflink-walkthrough is located in the `pyflink-walkthrough` root directory.
+
+### Building the Docker image
+
+First, build the Docker image by running:
+
+```bash
+docker-compose build
+```
+
+### Starting the Playground
+
+Once the Docker image build is complete, run the following command to start 
the playground:
+
+```bash
+docker-compose up -d
+```
+
+One way of checking if the playground was successfully started is accessing 
some of the services exposed:
+
+1. visiting Flink Web UI [http://localhost:8081](http://localhost:8081).
+2. visiting Elasticsearch [http://localhost:9200](http://localhost:9200).
+3. visiting Kibana [http://localhost:5601](http://localhost:5601).
+
+**Note:** you may need to wait around 1 minute before all the services come up.
+
+### Stopping the Playground
+
+To stop the playground, run the following command:
+
+```bash
+docker-compose down
+```
+
+
+## Runing the PyFlink job

Review comment:
       ```suggestion
   ## Running the PyFlink job
   ```

##########
File path: pyflink-walkthrough/README.md
##########
@@ -0,0 +1,107 @@
+# pyflink-walkthrough
+
+## Background
+
+In this playground, you will learn how to build and run an end-to-end PyFlink 
pipeline for data analytics, covering the following steps:
+
+* Reading data from a Kafka source;
+* Creating data using a 
[UDF](https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/python/table-api-users-guide/udfs/python_udfs.html);
+* Performing a simple aggregation over the source data;
+* Writing the results to Elasticsearch and visualizing them in Kibana.
+
+### Kafka
+You will be using Kafka to store sample input data about payment transactions. 
A simple data generator 
[generate_source_data.py](generator/generate_source_data.py) is provided to
+continuously write new records to the `payment_msg` Kafka topic. Each record 
is structured as follows:
+ 
+`{"createTime": "2020-08-12 06:29:02", "orderId": 1597213797, "payAmount": 
28306.44976403719, "payPlatform": 0, "provinceId": 4}`
+
+* `createTime`: The creation time of the transaction. 
+* `orderId`: The id of the current transaction.
+* `payAmount`: The pay amount of the current transaction.
+* `payPlatform`: The platform used to pay the order, pc or mobile.
+* `provinceId`: The id of the province for the user. 
+
+You can use the following command to read data from the Kafka topic and check 
whether it's generated correctly:
+```shell script
+$ docker-compose exec kafka kafka-console-consumer.sh --bootstrap-server 
kafka:9092 --topic payment_msg
+{"createTime":"2020-07-27 
09:25:32.77","orderId":1595841867217,"payAmount":7732.44,"payPlatform":0,"provinceId":3}
+{"createTime":"2020-07-27 
09:25:33.231","orderId":1595841867218,"payAmount":75774.05,"payPlatform":0,"provinceId":3}
+{"createTime":"2020-07-27 
09:25:33.72","orderId":1595841867219,"payAmount":65908.55,"payPlatform":0,"provinceId":0}
+{"createTime":"2020-07-27 
09:25:34.216","orderId":1595841867220,"payAmount":15341.11,"payPlatform":0,"provinceId":1}
+{"createTime":"2020-07-27 
09:25:34.698","orderId":1595841867221,"payAmount":37504.42,"payPlatform":0,"provinceId":0}
+```
+
+### PyFlink
+
+The transaction data will be processed with PyFlink using the Python script 
[payment_msg_processing.py](payment_msg_proccessing.py).
+This script will first map the `provinceId` in the input records to its 
corresponding province name
+using a Python UDF, and them sum the transaction amount for each province 
using a group aggregate. 
+
+### ElasticSearch
+
+ElasticSearch is used to store upstream processing results and provide 
efficient query service.
+
+### Kibana
+
+Kibana is an open source data visualization dashboard for ElasticSearch. You 
will use it to visualize 
+the results of your PyFlink pipeline.
+
+## Setup
+
+As mentioned, the environment for this walkthrough is based on Docker Compose; 
and uses a custom image
+to spin up Flink (JobManager+TaskManager), Kafka+Zookeeper, the data generator 
and Elasticsearch+Kibana
+containers.

Review comment:
       ```suggestion
   As mentioned, the environment for this walkthrough is based on Docker 
Compose. It uses a custom image
   to spin up Flink (JobManager+TaskManager), Kafka+Zookeeper, the data 
generator, and Elasticsearch+Kibana
   containers.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-playgrounds] alpinegizmo commented on a change in pull request #16: [FLINK-19145][walkthroughs] Add PyFlink-walkthrough to Flink playground.

Reply via email to