This is an automated email from the ASF dual-hosted git repository.
benjobs pushed a commit to branch dev
in repository
https://gitbox.apache.org/repos/asf/incubator-streampark-website.git
The following commit(s) were added to refs/heads/dev by this push:
new 550a5bc0 Review and Improve the translation for blog articles (#295)
550a5bc0 is described below
commit 550a5bc0a106e5b4f4ae7f518f308926c5a984e8
Author: Leomax_Sun <[email protected]>
AuthorDate: Fri Nov 24 16:20:29 2023 +0800
Review and Improve the translation for blog articles (#295)
---
blog/0-streampark-flink-on-k8s.md | 30 ++++++++++++----------
blog/1-flink-framework-streampark.md | 10 ++++----
.../0-streampark-flink-on-k8s.md | 10 +++++---
3 files changed, 27 insertions(+), 23 deletions(-)
diff --git a/blog/0-streampark-flink-on-k8s.md
b/blog/0-streampark-flink-on-k8s.md
index c074be05..4474edba 100644
--- a/blog/0-streampark-flink-on-k8s.md
+++ b/blog/0-streampark-flink-on-k8s.md
@@ -38,7 +38,7 @@ RUN mkdir -p $FLINK_HOME/usrlib
COPY my-flink-job.jar $FLINK_HOME/usrlib/my-flink-job.jar
```
-4. Use Flink client script to start Flink tasks
+3. Use Flink client script to start Flink tasks
```shell
@@ -52,13 +52,13 @@ COPY my-flink-job.jar $FLINK_HOME/usrlib/my-flink-job.jar
local:///opt/flink/usrlib/my-flink-job.jar
```
-5. Use the Kubectl command to obtain the WebUI access address and JobId of the
Flink job.
+4. Use the Kubectl command to obtain the WebUI access address and JobId of the
Flink job.
```shell
kubectl -n flink-cluster get svc
```
-6. Stop the job using Flink command
+5. Stop the job using Flink command
```shell
./bin/flink cancel
@@ -73,13 +73,13 @@ kubectl -n flink-cluster get svc
There will be higher requirements for using Flink on Kubernetes in
enterprise-level production environments. Generally, you will choose to build
your own platform or purchase related commercial products. No matter which
solution meets the product capabilities: large-scale task development and
deployment, status tracking, operation and maintenance monitoring , failure
alarms, unified task management, high availability, etc. are common demands.
- In response to the above issues, we investigated open source projects in the
open source field that support the development and deployment of Flink on
Kubernetes tasks. During the investigation, we also encountered other excellent
open source projects. After comprehensively comparing multiple open source
projects, we came to the conclusion: ** Whether StreamPark is completed The
overall performance such as speed, user experience, and stability are all very
good, so we finally chose Str [...]
+ In response to the above issues, we investigated open source projects in the
open source field that support the development and deployment of Flink on
Kubernetes tasks. During the investigation, we also encountered other excellent
open source projects. After comprehensively comparing multiple open source
projects, we came to the conclusion: **StreamPark has great performance in
either completness, user experience, or stability, so we finally chose
StreamPark as our one-stop real-time c [...]
Let’s take a look at how StreamPark supports Flink on Kubernetes:
### **Basic environment configuration**
- Basic environment configuration includes Kubernetes and Docker warehouse
information as well as Flink client information configuration. The simplest way
for the Kubernetes basic environment is to directly copy the .kube/config of
the Kubernetes node to the StreamPark node user directory, and then use the
kubectl command to create a Flink-specific Kubernetes Namespace and perform
RBAC configuration.
+ Basic environment configuration includes Kubernetes and Docker repository
information as well as Flink client information configuration. The simplest way
for the Kubernetes basic environment is to directly copy the .kube/config of
the Kubernetes node to the StreamPark node user directory, and then use the
kubectl command to create a Flink-specific Kubernetes Namespace and perform
RBAC configuration.
```shell
# Create k8s namespace used by Flink jobs
@@ -125,13 +125,13 @@ After the job development is completed, the job comes
online. In this step, Stre
- Dependency download in job
- Build job (JAR package)
- Build image
-- Push the image to the remote warehouse
+- Push the image to the remote repository
**For users: Just click the cloud-shaped online button in the task list**

-We can see a series of work done by StreamPark when building and pushing the
image.: **Read the configuration, build the image, and push the image to the
remote warehouse...** I want to give StreamPark a big thumbs up!
+We can see a series of work done by StreamPark when building and pushing the
image: **Read the configuration, build the image, and push the image to the
remote repository...** I want to give StreamPark a big thumbs up!

@@ -181,13 +181,13 @@ Next, let’s take a look at how StreamPark supports this
capability:
## Problems encountered
- Any new technology has a process of exploration and pitfalls. The experience
of failure is precious. Here are some pitfalls and experiences that StreamPark
has stepped into during the implementation of fog core technology. **The
content of this section is not only about StreamPark. I believe it will bring
some reference to all friends who use Flink on Kubernetes.
+ Any new technology has a process of exploration and fall into pitfalls. The
experience of failure is precious. Here are some pitfalls and experiences that
StreamPark has stepped into during the implementation of fog core technology.
**The content of this section is not only about StreamPark. I believe it will
bring some reference to all friends who use Flink on Kubernetes**.
### **FAQs are summarized below**
- **Kubernetes pod failed to pull the image**
-The main problem is that Kubernetes pod-template lacks docker’s
imagePullSecrets
+ The main problem is that Kubernetes pod-template lacks docker’s
imagePullSecrets
- **Scala version inconsistent**
@@ -215,9 +215,11 @@ The main problem is that Kubernetes pod-template lacks
docker’s imagePullSecre
- **The changed code did not take effect after it was republished**
-This issue is related to the Kubernetes pod image pull policy. It is
recommended to set the Pod image pull policy to Always:
+ This issue is related to the Kubernetes pod image pull policy. It is
recommended to set the Pod image pull policy to Always:
+```shell
-Dkubernetes.container.image.pull-policy=Always
+```
- **Each restart of the task will result in one more Job instance**
@@ -225,7 +227,7 @@ This issue is related to the Kubernetes pod image pull
policy. It is recommended
- **How to implement kubernetes pod domain name access**
-Domain name configuration only needs to be configured in pod-template
according to Kubernetes resources. I can share with you a pod-template.yaml
template that I summarized based on the above issues:
+ Domain name configuration only needs to be configured in pod-template
according to Kubernetes resources. I can share with you a pod-template.yaml
template that I summarized based on the above issues:
```yaml
@@ -281,7 +283,7 @@ Create a Dockerfile file and place the Dockerfile file in
the same folder as the
FROM flink:1.13.6-scala_2.11COPY lib $FLINK_HOME/lib/
```
-**3. Create a basic image and push it to a private warehouse**
+**3. Create a basic image and push it to a private repository**
```shell
docker login --username=xxxdocker \
@@ -295,7 +297,7 @@ push
k8s-harbor.xxx.com/streamx/udf_flink_1.13.6-scala_2.11:latest
- **StreamPark supports Flink job metric monitoring**
-It would be great if StreamPark could connect to Flink Metric data and display
Flink’s real-time consumption data at every moment on the StreamPark platform.
+ It would be great if StreamPark could connect to Flink Metric data and
display Flink’s real-time consumption data at every moment on the StreamPark
platform.
- **StreamPark supports Flink job log persistence**
@@ -303,4 +305,4 @@ It would be great if StreamPark could connect to Flink
Metric data and display F
- **Improvement of the problem of too large image**
-StreamPark's current image support for Flink on Kubernetes jobs is to combine
the basic image and user code into a Fat image and push it to the Docker
warehouse. The problem with this method is that it takes a long time when the
image is too large. It is hoped that the basic image can be restored in the
future. There is no need to hit the business code together every time, which
can greatly improve development efficiency and save costs.
+ StreamPark's current image support for Flink on Kubernetes jobs is to
combine the basic image and user code into a Fat image and push it to the
Docker repository. The problem with this method is that it takes a long time
when the image is too large. It is hoped that the basic image can be restored
in the future. There is no need to hit the business code together every time,
which can greatly improve development efficiency and save costs.
diff --git a/blog/1-flink-framework-streampark.md
b/blog/1-flink-framework-streampark.md
index b02b9e48..25070df4 100644
--- a/blog/1-flink-framework-streampark.md
+++ b/blog/1-flink-framework-streampark.md
@@ -4,7 +4,7 @@ title: StreamPark - Powerful Flink Development Framework
tags: [StreamPark, DataStream, FlinkSQL]
---
-Although the Hadoop system is widely used today, its architecture is
complicated, it has a high maintenance complexity, version upgrades are
challenging, and due to departmental reasons, data center scheduling is
prolonged. We urgently need to explore agile data platform models. With the
current prevalence of cloud-native architecture and the backdrop of lake and
warehouse integration, we have decided to use Doris as an offline data
warehouse and TiDB (which is already in production) as [...]
+Although the Hadoop system is widely used today, its architecture is
complicated, it has a high maintenance complexity, version upgrades are
challenging, and due to departmental reasons, data center scheduling is
prolonged. We urgently need to explore agile data platform models. With the
current popularization of cloud-native architecture and the integration between
lake and warehous, we have decided to use Doris as an offline data warehouse
and TiDB (which is already in production) as a [...]

@@ -12,7 +12,7 @@ Although the Hadoop system is widely used today, its
architecture is complicated
# 1. Background
-Although the Hadoop system is widely used today, its architecture is
complicated, it has a high maintenance complexity, version upgrades are
challenging, and due to departmental reasons, data center scheduling is
prolonged. We urgently need to explore agile data platform models. With the
current prevalence of cloud-native architecture and the backdrop of lake and
warehouse integration, we have decided to use Doris as an offline data
warehouse and TiDB (which is already in production) as [...]
+Although the Hadoop system is widely used today, its architecture is
complicated, it has a high maintenance complexity, version upgrades are
challenging, and due to departmental reasons, data center scheduling is
prolonged. We urgently need to explore agile data platform models. With the
current popularization of cloud-native architecture and the integration between
lake and warehous, we have decided to use Doris as an offline data warehouse
and TiDB (which is already in production) as a [...]

@@ -217,7 +217,7 @@ It becomes evident that StreamPark essentially uploads the
jar package to the Fl
### Custom Code Mode
-To our delight, StreamPark also provides support for coding
DataStream/FlinkSQL tasks. For special requirements, we can author our
implementations in Java/Scala. You can compose tasks following the scaffold
method recommended by StreamPark or write a standard Flink task. By adopting
this approach, we can delegate code management to git, utilizing the platform
for automated compilation, packaging, and deployment. Naturally, if
functionality can be achieved via SQL, we would prefer not to [...]
+To our delight, StreamPark also provides support for coding
DataStream/FlinkSQL tasks. For special requirements, we can achieve our
implementations in Java/Scala. You can compose tasks following the scaffold
method recommended by StreamPark or write a standard Flink task. By adopting
this approach, we can delegate code management to git, utilizing the platform
for automated compilation, packaging, and deployment. Naturally, if
functionality can be achieved via SQL, we would prefer not to [...]
<br/><br/>
@@ -225,11 +225,11 @@ To our delight, StreamPark also provides support for
coding DataStream/FlinkSQL
## Suggestions for Improvement
-StreamPark, as with any new tool, does have areas ripe for enhancement based
on our current evaluations:
+StreamPark, similar to any other new tools, does have areas for further
enhancement based on our current evaluations:
* **Strengthening Resource Management**: Features like multi-file system jar
resources and robust task versioning are still awaiting additions.
* **Enriching Frontend Features**: For instance, once a task is added,
functionalities like copying could be integrated.
-* **Visualization of Task Submission Logs**: The process of task submission
involves loading class files, jar packaging, building and submitting images,
and more. A failure at any of these stages could halt the task. Yet, error logs
often lack clarity, or due to some anomaly, the exceptions aren't thrown as
expected, leaving users puzzled about rectifications.
+* **Visualization of Task Submission Logs**: The process of task submission
involves loading class files, jar packaging, building and submitting images,
and more. A failure at any of these stages could halt the task. However, error
logs are not always clear, or due to some anomaly, the exceptions aren't thrown
as expected, leaving users puzzled about rectifications.
It's a universal truth that innovations aren't perfect from the outset.
Although minor issues exist and there are areas for improvement with
StreamPark, its merits outweigh its limitations. As a result, we've chosen
StreamPark as our Flink DevOps platform. We're also committed to collaborating
with its main developers to refine StreamPark further. We wholeheartedly invite
others to use it and contribute towards its advancement.
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-blog/0-streampark-flink-on-k8s.md
b/i18n/zh-CN/docusaurus-plugin-content-blog/0-streampark-flink-on-k8s.md
index 4e9096c8..39e96328 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-blog/0-streampark-flink-on-k8s.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-blog/0-streampark-flink-on-k8s.md
@@ -38,7 +38,7 @@ RUN mkdir -p $FLINK_HOME/usrlib
COPY my-flink-job.jar $FLINK_HOME/usrlib/my-flink-job.jar
```
-4. 使用 Flink 客户端脚本启动 Flink 任务
+3. 使用 Flink 客户端脚本启动 Flink 任务
```shell
@@ -52,13 +52,13 @@ COPY my-flink-job.jar $FLINK_HOME/usrlib/my-flink-job.jar
local:///opt/flink/usrlib/my-flink-job.jar
```
-5. 使用 Kubectl 命令获取到 Flink 作业的 WebUI 访问地址和 JobId
+4. 使用 Kubectl 命令获取到 Flink 作业的 WebUI 访问地址和 JobId
```shell
kubectl -n flink-cluster get svc
```
-6. 使用Flink命令停止作业
+5. 使用Flink命令停止作业
```shell
./bin/flink cancel
@@ -186,7 +186,7 @@ StreamPark 在雾芯科技落地较晚,目前主要用于实时数据集成作
## 遇到的问题
-任何新技术都有探索与踩坑的过程,失败的经验是宝贵的,这里介绍下 StreamPark 在雾芯科技落地过程中踩的一些坑和经验,**这块的内容不仅仅关于
StreamPark 的部分, 相信会带给所有使用 Flink on Kubernetes 的小伙伴一些参考。
+任何新技术都有探索与踩坑的过程,失败的经验是宝贵的,这里介绍下 StreamPark 在雾芯科技落地过程中踩的一些坑和经验,**这块的内容不仅仅关于
StreamPark 的部分, 相信会带给所有使用 Flink on Kubernetes 的小伙伴一些参考**。
### **常见问题总结如下**
@@ -222,7 +222,9 @@ HDFS 阿里云 OSS/AWS S3 都可以进行 checkpoint 和 savepoint 存储,Flin
该问题与 Kubernetes pod 镜像拉取策略有关,建议将 Pod 镜像拉取策略设置为 Always:
+```shell
-Dkubernetes.container.image.pull-policy=Always
+```
- **任务每次重启都会导致多出一个 Job 实例**