This is an automated email from the ASF dual-hosted git repository.
cancai pushed a commit to branch dev
in repository
https://gitbox.apache.org/repos/asf/incubator-streampark-website.git
The following commit(s) were added to refs/heads/dev by this push:
new 337ac345 [Improve]Translate the Chinese in the picture into
English(chinaunion) (#328)
337ac345 is described below
commit 337ac3453b7b9650d7ca6b129c6af31556dcbf26
Author: Kick156 <[email protected]>
AuthorDate: Sun Jan 28 22:49:10 2024 +0800
[Improve]Translate the Chinese in the picture into English(chinaunion)
(#328)
* [Improve]Translate the Chinese in the picture into English(chinaunion)
---
blog/2-streampark-usercase-chinaunion.md | 44 ++++++++++-----------
.../chinaunion/contribution_and_enhancement_en.png | Bin 0 -> 860484 bytes
.../chinaunion/data_processing_processes_en.png | Bin 0 -> 933928 bytes
.../blog/chinaunion/development_efficiency_en.png | Bin 0 -> 1069644 bytes
static/blog/chinaunion/devops_platform_en.png | Bin 0 -> 870991 bytes
static/blog/chinaunion/difficulties_en.png | Bin 0 -> 1047547 bytes
static/blog/chinaunion/job_management_en.png | Bin 0 -> 845008 bytes
static/blog/chinaunion/multi_team_support_en.png | Bin 0 -> 933424 bytes
.../multiple_environments_and_components_en.png | Bin 0 -> 1018222 bytes
.../chinaunion/multiple_execution_modes_en.png | Bin 0 -> 930085 bytes
.../blog/chinaunion/operational_background_en.png | Bin 0 -> 908120 bytes
static/blog/chinaunion/overall_architecture_en.png | Bin 0 -> 885988 bytes
static/blog/chinaunion/platform_background_en.png | Bin 0 -> 847610 bytes
static/blog/chinaunion/platform_evolution_en.png | Bin 0 -> 897677 bytes
.../blog/chinaunion/platformized_management_en.png | Bin 0 -> 949047 bytes
static/blog/chinaunion/road_map_en.png | Bin 0 -> 874992 bytes
static/blog/chinaunion/state_optimization_en.png | Bin 0 -> 792706 bytes
static/blog/chinaunion/state_recovery_en.png | Bin 0 -> 875351 bytes
.../status_acquisition_bottleneck_en.png | Bin 0 -> 812005 bytes
static/blog/chinaunion/submission_process_en.png | Bin 0 -> 941018 bytes
static/blog/chinaunion/versioning_en.png | Bin 0 -> 753078 bytes
21 files changed, 22 insertions(+), 22 deletions(-)
diff --git a/blog/2-streampark-usercase-chinaunion.md
b/blog/2-streampark-usercase-chinaunion.md
index 2472eb47..ed22645c 100644
--- a/blog/2-streampark-usercase-chinaunion.md
+++ b/blog/2-streampark-usercase-chinaunion.md
@@ -4,7 +4,7 @@ title: China Union's Flink Real-Time Computing Platform Ops
Practice
tags: [StreamPark, Production Practice, FlinkSQL]
---
-
+
**Abstract:** This article is compiled from the sharing of Mu Chunjin, the
head of China Union Data Science's real-time computing team and Apache
StreamPark Committer, at the Flink Forward Asia 2022 platform construction
session. The content of this article is mainly divided into four parts:
@@ -22,7 +22,7 @@ The image above depicts the overall architecture of the
real-time computing plat
-
+
The image above provides a detailed workflow of data processing.
@@ -30,11 +30,11 @@ The first part is collection and parsing. Our data sources
come from business da
The second part is real-time computing. This stage deals with a massive amount
of data, in the trillions, supporting over 10,000 real-time data subscriptions.
There are more than 200 Flink tasks. We encapsulate a certain type of business
into a scenario, and a single Flink job can support multiple subscriptions in
the same scenario. Currently, the number of Flink jobs is continuously
increasing, and in the future, it might increase to over 500. One of the major
challenges faced here is t [...]
-
+
In 2018, we adopted a third-party black-box computing engine, which did not
support flexible customization of personalized functions, and depended heavily
on external systems, resulting in high loads on these external systems and
complex operations and maintenance. In 2019, we utilized Spark Streaming's
micro-batch processing. From 2020, we began to use Flink for stream computing.
Starting from 2021, almost all Spark Streaming micro-batch processing tasks
have been replaced by Flink. At [...]
-
+
To summarize the platform background, it mainly includes the following parts:
@@ -43,7 +43,7 @@ To summarize the platform background, it mainly includes the
following parts:
- Numerous subscriptions: supported more than 10,000 data service
subscriptions.
- Numerous users: supported the usage of more than 30 internal and external
users.
-
+
The operational maintenance background can also be divided into the following
parts:
@@ -54,9 +54,9 @@ The operational maintenance background can also be divided
into the following pa
- Numerous users: Over 30 internal and external organizations' users are
utilizing the platform.
- Low monitoring latency: Once an issue is identified, we must address it
immediately to avoid user complaints.
-## **Flink 实时作业运维挑战**
+## **Flink Real-Time Job Operation and Maintenance Challenges**
-
+
Given the platform and operational maintenance background, particularly with
the increasing number of Flink jobs, we have encountered significant challenges
in two main areas: job operation and maintenance dilemmas, and business support
difficulties.
@@ -68,9 +68,9 @@ In terms of job operation and maintenance dilemmas, firstly,
the job deployment
Due to various factors in the job operation and maintenance difficulties,
business support challenges arise, such as a high rate of failures during
launch, impact on data quality, lengthy launch times, high data latency, and
issues with missed alarm handling, leading to complaints. In addition, the
impact on our business is unclear, and once a problem arises, addressing the
issue becomes the top priority.
-## **基于 Apache StreamPark™ 一体化管理**
+## **Integrated Management based on Apache StreamPark™**
-
+
In response to the two dilemmas mentioned above, we have resolved many issues
through StreamPark's integrated management. First, let's take a look at the
dual evolution of StreamPark, which includes Flink Job Management and Flink Job
DevOps Platform. In terms of job management, StreamPark supports deploying
Flink real-time jobs to different clusters, such as Flink's native Standalone
mode, and the Session, Application, and PerJob modes of Flink on Yarn. In the
latest version, it will sup [...]
@@ -86,7 +86,7 @@ In response to the two dilemmas mentioned above, we have
resolved many issues th
StreamPark supports the submission of Flink SQL and Flink Jar, allows for
resource configuration, and supports state tracking, indicating whether the
state is running, failed, etc. Additionally, it provides a metrics dashboard
and supports the viewing of various logs.
-
+
The Flink Job DevOps platform primarily consists of the following parts:
- Teams: StreamPark supports multiple teams, each with its team administrator
who has all permissions. There are also team developers who only have a limited
set of permissions.
@@ -95,11 +95,11 @@ The Flink Job DevOps platform primarily consists of the
following parts:
- State Monitoring: After the Flink job is started, real-time tracking of its
state begins, including Flink's running status, runtime duration, Checkpoint
information, etc. There is also support for one-click redirection to Flink's
Web UI.
- Logs and Alerts: This includes logs from the build and start-up processes
and supports alerting methods such as DingTalk, WeChat, email, and SMS.
-
+
Companies generally have multiple teams working on real-time jobs
simultaneously. In our company, this includes a real-time data collection team,
a data processing team, and a real-time marketing team. StreamPark supports
resource isolation for multiple teams.
-
+
Management of the Flink job platform faces the following challenges:
- Numerous scripts: There are several hundred scripts on the platform,
scattered across multiple servers.
@@ -116,15 +116,15 @@ Based on the challenges mentioned above, StreamPark has
addressed the issues of
Referring to the image above, you can see at the bottom of the diagram that
packaging is conducted through project management, configuration is done via
job management, and then it is released. This process allows for one-click
start and stop operations, and jobs can be submitted through the API.
-
+
In the early stages, we needed to go through seven steps for deployment,
including connecting to a VPN, logging in through 4A, executing compile
scripts, executing start scripts, opening Yarn, searching for the job name, and
entering the Flink UI. StreamPark supports one-click deployment for four of
these steps, including one-click packaging, one-click release, one-click start,
and one-click access to the Flink UI.
-
+
The image above illustrates the job submission process of our StreamPark
platform. Firstly, StreamPark proceeds to release the job, during which some
resources are uploaded. Following that, the job is submitted, accompanied by
various configured parameters, and it is published to the cluster using the
Flink Submit method via an API call. At this point, there are multiple Flink
Submit instances corresponding to different execution modes, such as Yarn
Session, Yarn Application, Kubernetes [...]
-
+
As mentioned above, in the case of Flink on Yarn jobs, two IDs are acquired
upon job submission: the Application ID and the Job ID. These IDs are used to
retrieve the job status. However, when there is a large number of Flink jobs,
certain issues may arise. StreamPark utilizes a status retriever that
periodically sends requests to the ResourceManager every five seconds, using
the Application ID or Job ID stored in our database. If there are a
considerable number of jobs, during each poll [...]
@@ -132,29 +132,29 @@ As mentioned above, in the case of Flink on Yarn jobs,
two IDs are acquired upon
In the diagram mentioned earlier, the connection count to the ResourceManager
shows periodic and sustained increases, indicating that the ResourceManager is
in a relatively critical state. This is evidenced by monitoring data from the
server, which indeed shows a higher number of connections to the
ResourceManager.
-
+
To address the issues mentioned above, we have made some optimizations in
StreamPark. Firstly, after submitting a job, StreamPark saves the Application
ID or Job ID, and it also retrieves and stores the direct access address of the
Job Manager in the database. Therefore, instead of polling the ResourceManager
for job status, it can directly call the addresses of individual Job Managers
to obtain the real-time status. This significantly reduces the number of
connections to the ResourceMan [...]
-
+
Another issue that StreamPark resolves is safeguarding Flink's state recovery.
In the past, when we used scripts for operations and maintenance, especially
during business upgrades, it was necessary to recover from the latest
checkpoint when starting Flink. However, developers often forgot to recover
from the previous checkpoint, leading to significant data quality issues and
complaints. StreamPark's process is designed to mitigate this issue. Upon the
initial start of a Flink job, it po [...]
-
+
StreamPark also addresses the challenges associated with referencing multiple
components across various environments. In a corporate setting, there are
typically multiple environments, such as development, testing, and production.
Each environment generally includes multiple components, such as Kafka, HBase,
Redis, etc. Additionally, within a single environment, there may be multiple
instances of the same component. For example, in a real-time computing platform
at China Union, when cons [...]
-
+
StreamPark supports multiple execution modes for Flink, including three
deployment modes based on Yarn: Application, Perjob, and Session. Additionally,
it supports two deployment modes for Kubernetes: Application and Session, as
well as some Remote modes.
-
+
StreamPark also supports multiple versions of Flink. For example, while our
current version is 1.14.x, we would like to experiment with the new 1.16.x
release. However, it’s not feasible to upgrade all existing jobs to 1.16.x.
Instead, we can opt to upgrade only the new jobs to 1.16.x, allowing us to
leverage the benefits of the new version while maintaining compatibility with
the older version.
## **Future Planning and Evolution**
-
+
In the future, we will increase our involvement in the development of
StreamPark, and we have planned the following directions for enhancement:
- High Availability: StreamPark currently does not support high availability,
and this aspect needs further strengthening.
@@ -162,7 +162,7 @@ In the future, we will increase our involvement in the
development of StreamPark
- More Detailed Monitoring: Currently, StreamPark supports sending alerts when
a Flink job fails. We hope to also send alerts when a Task fails, and need to
know the reason for the failure. In addition, enhancements are needed in job
backpressure monitoring alerts, Checkpoint timeout alerts, failure alerts, and
performance metric collection.
- Stream-Batch Integration: Explore a platform that integrates both streaming
and batch processing, combining the Flink stream-batch unified engine with data
lake storage that supports stream-batch unification.
-
+
The above diagram represents the Roadmap for StreamPark.
- Data Source: StreamPark will support rapid integration with more data
sources, achieving one-click data onboarding.
diff --git a/static/blog/chinaunion/contribution_and_enhancement_en.png
b/static/blog/chinaunion/contribution_and_enhancement_en.png
new file mode 100644
index 00000000..7b27927f
Binary files /dev/null and
b/static/blog/chinaunion/contribution_and_enhancement_en.png differ
diff --git a/static/blog/chinaunion/data_processing_processes_en.png
b/static/blog/chinaunion/data_processing_processes_en.png
new file mode 100644
index 00000000..442e9dc8
Binary files /dev/null and
b/static/blog/chinaunion/data_processing_processes_en.png differ
diff --git a/static/blog/chinaunion/development_efficiency_en.png
b/static/blog/chinaunion/development_efficiency_en.png
new file mode 100644
index 00000000..df702d49
Binary files /dev/null and
b/static/blog/chinaunion/development_efficiency_en.png differ
diff --git a/static/blog/chinaunion/devops_platform_en.png
b/static/blog/chinaunion/devops_platform_en.png
new file mode 100644
index 00000000..a97efb26
Binary files /dev/null and b/static/blog/chinaunion/devops_platform_en.png
differ
diff --git a/static/blog/chinaunion/difficulties_en.png
b/static/blog/chinaunion/difficulties_en.png
new file mode 100644
index 00000000..026d610d
Binary files /dev/null and b/static/blog/chinaunion/difficulties_en.png differ
diff --git a/static/blog/chinaunion/job_management_en.png
b/static/blog/chinaunion/job_management_en.png
new file mode 100644
index 00000000..4b47f340
Binary files /dev/null and b/static/blog/chinaunion/job_management_en.png differ
diff --git a/static/blog/chinaunion/multi_team_support_en.png
b/static/blog/chinaunion/multi_team_support_en.png
new file mode 100644
index 00000000..6c0cbad3
Binary files /dev/null and b/static/blog/chinaunion/multi_team_support_en.png
differ
diff --git a/static/blog/chinaunion/multiple_environments_and_components_en.png
b/static/blog/chinaunion/multiple_environments_and_components_en.png
new file mode 100644
index 00000000..384882d0
Binary files /dev/null and
b/static/blog/chinaunion/multiple_environments_and_components_en.png differ
diff --git a/static/blog/chinaunion/multiple_execution_modes_en.png
b/static/blog/chinaunion/multiple_execution_modes_en.png
new file mode 100644
index 00000000..0a51bdde
Binary files /dev/null and
b/static/blog/chinaunion/multiple_execution_modes_en.png differ
diff --git a/static/blog/chinaunion/operational_background_en.png
b/static/blog/chinaunion/operational_background_en.png
new file mode 100644
index 00000000..f211718d
Binary files /dev/null and
b/static/blog/chinaunion/operational_background_en.png differ
diff --git a/static/blog/chinaunion/overall_architecture_en.png
b/static/blog/chinaunion/overall_architecture_en.png
new file mode 100644
index 00000000..bf2444a1
Binary files /dev/null and b/static/blog/chinaunion/overall_architecture_en.png
differ
diff --git a/static/blog/chinaunion/platform_background_en.png
b/static/blog/chinaunion/platform_background_en.png
new file mode 100644
index 00000000..b16c6617
Binary files /dev/null and b/static/blog/chinaunion/platform_background_en.png
differ
diff --git a/static/blog/chinaunion/platform_evolution_en.png
b/static/blog/chinaunion/platform_evolution_en.png
new file mode 100644
index 00000000..4c965091
Binary files /dev/null and b/static/blog/chinaunion/platform_evolution_en.png
differ
diff --git a/static/blog/chinaunion/platformized_management_en.png
b/static/blog/chinaunion/platformized_management_en.png
new file mode 100644
index 00000000..8bfae84c
Binary files /dev/null and
b/static/blog/chinaunion/platformized_management_en.png differ
diff --git a/static/blog/chinaunion/road_map_en.png
b/static/blog/chinaunion/road_map_en.png
new file mode 100644
index 00000000..b3e9648f
Binary files /dev/null and b/static/blog/chinaunion/road_map_en.png differ
diff --git a/static/blog/chinaunion/state_optimization_en.png
b/static/blog/chinaunion/state_optimization_en.png
new file mode 100644
index 00000000..9dbe8674
Binary files /dev/null and b/static/blog/chinaunion/state_optimization_en.png
differ
diff --git a/static/blog/chinaunion/state_recovery_en.png
b/static/blog/chinaunion/state_recovery_en.png
new file mode 100644
index 00000000..e6d82822
Binary files /dev/null and b/static/blog/chinaunion/state_recovery_en.png differ
diff --git a/static/blog/chinaunion/status_acquisition_bottleneck_en.png
b/static/blog/chinaunion/status_acquisition_bottleneck_en.png
new file mode 100644
index 00000000..271bf6c2
Binary files /dev/null and
b/static/blog/chinaunion/status_acquisition_bottleneck_en.png differ
diff --git a/static/blog/chinaunion/submission_process_en.png
b/static/blog/chinaunion/submission_process_en.png
new file mode 100644
index 00000000..fe0f6b31
Binary files /dev/null and b/static/blog/chinaunion/submission_process_en.png
differ
diff --git a/static/blog/chinaunion/versioning_en.png
b/static/blog/chinaunion/versioning_en.png
new file mode 100644
index 00000000..1ec3d6e2
Binary files /dev/null and b/static/blog/chinaunion/versioning_en.png differ