This is an automated email from the ASF dual-hosted git repository.
kirs pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler-website.git
The following commit(s) were added to refs/heads/master by this push:
new 2a9e51c updata link (#645)
2a9e51c is described below
commit 2a9e51c9246808ee51d5e4abb446b75c4b906819
Author: lifeng <[email protected]>
AuthorDate: Wed Jan 19 15:26:00 2022 +0800
updata link (#645)
* updata link
updata link
* Update home.jsx
---
blog/en-us/Apache-DolphinScheduler-2.0.1.md | 20 +++++-----
blog/en-us/Awarded_most_popular_project_in_2021.md | 2 +-
blog/en-us/DS-2.0-alpha-release.md | 8 ++--
blog/en-us/Eavy_Info.md | 14 +++----
.../Introducing-Apache-DolphinScheduler-1.3.9.md | 6 +--
blog/en-us/Lizhi-case-study.md | 12 +++---
blog/en-us/Twos.md | 4 +-
blog/en-us/YouZan-case-study.md | 46 +++++++++++-----------
blog/zh-cn/Apache-DolphinScheduler-2.0.1.md | 24 +++++------
blog/zh-cn/Awarded_most_popular_project_in_2021.md | 8 ++--
blog/zh-cn/DS-2.0-alpha-release.md | 14 +++----
blog/zh-cn/Eavy_Info.md | 22 +++++------
blog/zh-cn/Lizhi-case-study.md | 12 +++---
blog/zh-cn/Twos.md | 10 ++---
blog/zh-cn/YouZan-case-study.md | 42 ++++++++++----------
site_config/home.jsx | 4 +-
16 files changed, 124 insertions(+), 124 deletions(-)
diff --git a/blog/en-us/Apache-DolphinScheduler-2.0.1.md
b/blog/en-us/Apache-DolphinScheduler-2.0.1.md
index 5356cb0..3cabb61 100644
--- a/blog/en-us/Apache-DolphinScheduler-2.0.1.md
+++ b/blog/en-us/Apache-DolphinScheduler-2.0.1.md
@@ -17,7 +17,7 @@ Download Apache DolphinScheduler
2.0.1:https://dolphinscheduler.apache.org/zh-
The workflow execution process activities of Apache DolphinScheduler 2.0.1 are
shown in the following figure:
<div align=center>
-<img src="https://imgpp.com/images/2021/12/20/master-process-2.0-en.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/20/master-process-2.0-en.png"/>
</div>
Start process activity diagram
@@ -96,7 +96,7 @@ Improves the efficiency of workflow operation.
The operation flow chart of the workflow and tasks under the API module are
shown as below:
<div align=center>
-<img src="https://imgpp.com/images/2021/12/20/3.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/20/3.png"/>
</div>
## 03 Automatic Version Upgrade Function
@@ -143,12 +143,12 @@ Currently, the transfer between shell tasks and sql tasks
is supported. Passing
Set an out variable "trans" in the previous "create_parameter" task:
echo'${setValue(trans=hello trans)}'
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/323f6a18d8a1d2f2d8fdcb5687c264b5.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/323f6a18d8a1d2f2d8fdcb5687c264b5.png"/>
</div>
Once Keyword: "${setValue(key=value)}" is detected in the task log of the
current task, the system will automatically parse the variable transfer value,
in the post-task, you can directly use the "trans" variable:
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/8be29339b73b594dc05a6b832d9330ec.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/8be29339b73b594dc05a6b832d9330ec.png"/>
</div>
The parameter passing of the SQL task:
@@ -157,13 +157,13 @@ select the value corresponding to the column with the
same variable name in the
output of user number:
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/85bc5216c01ca958cdf11d4bd555c8a6.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/85bc5216c01ca958cdf11d4bd555c8a6.png"/>
</div>
Use the variable "cnt" in downstream tasks:
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/4278d0b7f833b64f24fc3d6122287454.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/4278d0b7f833b64f24fc3d6122287454.png"/>
</div>
2.0.1 adds switch task and pigeon task components:
@@ -186,7 +186,7 @@ branch circulation. For other tasks, select D in the branch
circulation.
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/636f53ddc809f028ffdfc18fd08b5828.md.jpg"/>
+<img
src="https://s1.imgpp.com/2021/12/17/636f53ddc809f028ffdfc18fd08b5828.md.jpg"/>
</div>
@@ -207,7 +207,7 @@ Configure the worker running environment online. A worker
can specify multiple e
equivalent to the dolphinscheduler_env.sh file.
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/ef8b444c6dbebe397daaaa3bbadf743f.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/ef8b444c6dbebe397daaaa3bbadf743f.png"/>
</div>
@@ -267,8 +267,8 @@ Thanks to the 289 community contributors who participated
in the optimization an
particular order)!
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/2020b4f57e33734414a11149704ded92.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/2020b4f57e33734414a11149704ded92.png"/>
</div>
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/1825b6945d5845233b7389479ba6c074.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/1825b6945d5845233b7389479ba6c074.png"/>
</div>
diff --git a/blog/en-us/Awarded_most_popular_project_in_2021.md
b/blog/en-us/Awarded_most_popular_project_in_2021.md
index d80446c..033fae7 100644
--- a/blog/en-us/Awarded_most_popular_project_in_2021.md
+++ b/blog/en-us/Awarded_most_popular_project_in_2021.md
@@ -7,7 +7,7 @@ description: Recently, the "2021 OSC Best China Open Source
Projects Poll」init
# Apache DolphinScheduler Won the「2021 OSC Most Popular Projects」award, and
Whaleops Open Source Technology Received the honor of「Outstanding Chinese Open
Source Original Startups」!
<div align=center>
-<img
src="https://imgpp.com/images/2022/01/07/_1ca0eca926145ffc5f05f15b6b612a2b_64635.md.jpg"/>
+<img
src="https://s1.imgpp.com/2022/01/07/_1ca0eca926145ffc5f05f15b6b612a2b_64635.jpg"/>
</div>
Recently, the "2021 OSC Best China Open Source Projects Poll」initiated by
OSCHINA announced the selection results.
diff --git a/blog/en-us/DS-2.0-alpha-release.md
b/blog/en-us/DS-2.0-alpha-release.md
index e8e0a8a..4f34d61 100644
--- a/blog/en-us/DS-2.0-alpha-release.md
+++ b/blog/en-us/DS-2.0-alpha-release.md
@@ -1,6 +1,6 @@
# Refactoring, Plug-in, Performance Improves By 20 times, Apache
DolphinScheduler 2.0 alpha Release Highlights Check!
-<div align='center'><img
src="https://imgpp.com/images/2021/11/16/a920be6733a3d99af38d1cdebfcbb3ff.md.png"></div>
+<div align='center'><img
src="https://s1.imgpp.com/2021/11/16/a920be6733a3d99af38d1cdebfcbb3ff.md.png"></div>
Hello community, good news! After nearly 10 months of joint efforts by more
than 100 community contributors, we are happy to announce the release of Apache
DolphinScheduler 2.0 alpha. This is the first major version of DolphinScheduler
since it entered Apache. It has undergone a number of key updates and
optimizations, which means a milestone in the development of DolphinScheduler.
DolphinScheduler 2.0 alpha mainly refactors the implementation of Master,
greatly optimizes the metadata structure and processing flow, adds SPI plug-in
capabilities, and improves performance by 20 times. At the same time, the new
version has designed a brand new UI interface to bring a better user
experience. In addition, 2.0 alpha has newly added and optimized some features
that are eagerly demanded in the community, such as parameter transfer, version
control, import and export functions.
@@ -24,9 +24,9 @@ Increase the caching mechanism to greatly reduce the number
of database operatio
## UI Components Optimization Brings Brand New UI Interface
-<div align='center'><img
src="https://imgpp.com/images/2021/11/16/4e4024cbddbe3113f730c5e67f083c4f.md.png"></div>
+<div align='center'><img
src="https://s1.imgpp.com/2021/11/16/4e4024cbddbe3113f730c5e67f083c4f.md.png"></div>
-<div align='center'><img
src="https://imgpp.com/images/2021/11/16/75e002b21d827aee9aeaa3922c20c13f.md.png"></div>
+<div align='center'><img
src="https://s1.imgpp.com/2021/11/16/75e002b21d827aee9aeaa3922c20c13f.md.png"></div>
UI interface comparison: 1.3.9 (top) VS. 2.0 alpha (bottom)
@@ -75,6 +75,6 @@ In addition to changes in performance and UI,
DolphinScheduler has also undergon
The release of DolphinScheduler 2.0 alpha embodies the wisdom and strength of
the community contributors. Their active participation and great enthusiasm
open the DolphinScheduler 2.0 era!
Thanks so much for the participation of 100+ contributors (GitHub ID), and we
are looking forward to more and more open sourcing enthusiasts joining the
DolphinScheduler community co-construction, to contribute yourself to building
a more usable big data workflow scheduling platform!
-<div align='center'><img
src="https://imgpp.com/images/2021/11/16/8926d45ead1f735e8cfca0e8142b315f.md.png"></div>
+<div align='center'><img
src="https://s1.imgpp.com/2021/11/16/8926d45ead1f735e8cfca0e8142b315f.md.png"></div>
2.0 List of alpha contributors
diff --git a/blog/en-us/Eavy_Info.md b/blog/en-us/Eavy_Info.md
index 819fb81..5825887 100644
--- a/blog/en-us/Eavy_Info.md
+++ b/blog/en-us/Eavy_Info.md
@@ -6,7 +6,7 @@ description: Based on the Apache DolphinScheduler, the cloud
computing and big d
# Eavy Info Builds Data Asset Management Platform Services Based on Apache
DolphinScheduler to Construct Government Information Ecology | Use Case
<div align=center>
-<img src="https://imgpp.com/images/2021/12/29/1640759432737.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/29/1640759432737.png"/>
</div>
Based on the Apache DolphinScheduler, the cloud computing and big data
provider Eavy Info has been serving the business operations in the company for
more than a year.
@@ -22,7 +22,7 @@ Out of this consideration, we have developed a Data Asset
Management Platform, o
Apache DolphinScheduler is a distributed, decentralized, easy-to-expand visual
DAG scheduling system that supports multiple types of tasks including Shell,
Python, Spark, Flink, etc., and has good scalability. Its overall structure is
shown in the figure below:
<div align=center>
-<img src="https://imgpp.com/images/2021/12/28/1.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/28/1.png"/>
</div>
This is a typical master-slave architecture with strong horizontal
scalability. The scheduling engine Quartz is a Java open source project of
Spring Boot, it is easier to integrate and use for those familiar with Spring
Boot development.
@@ -50,13 +50,13 @@ Based on Apache DolphinScheduler, we carry out the
following practices.
In our business scenario, there are many types of business needs for data
synchronization, but the amount of data is not particularly large and is
real-time-undemanding. So at the beginning of the architecture selection, we
chose the combination of Datax+Apache DolphinScheduler and implemented the
transformation of the corresponding business. Now it is integrated into various
projects as a service product to provide offline synchronization services.
<div align=center>
-<img src="https://imgpp.com/images/2021/12/29/1-1.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/29/1-1.png"/>
</div>
Synchronization tasks are divided into periodic tasks and one-time tasks.
After the configuration tasks of the input and output sources, the corn
expression needs to be configured for periodic tasks, and then the save
interface is called to send the synchronization tasks to the DS scheduling
platform.
<div align=center>
-<img src="https://imgpp.com/images/2021/12/29/2-1.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/29/2-1.png"/>
</div>
Synchronization tasks are divided into **periodic tasks** and **one-time
tasks**. After the configuration tasks of the input and output sources are
configured, the corn expression needs to be configured for periodic tasks, and
then the **save interface** is called to send the synchronization tasks to the
DS scheduling platform.
@@ -68,7 +68,7 @@ The design of the entire synchronization module is aimed to
reuse the diversity
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/30/ffd0c839647bcce4c208ee0cf5b7622b.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/30/ffd0c839647bcce4c208ee0cf5b7622b.png"/>
</div>
## Self Development Practices Based on DS
@@ -76,7 +76,7 @@ The design of the entire synchronization module is aimed to
reuse the diversity
Anyone familiar with Datax knows that it is essentially an ETL tool, which
provides a transformer module that supports Groovy syntax, and at the same time
further enrich the tool classes used in the transformer in the Datax source
code, such as replacing, regular matching, screening, desensitization,
statistics, and other functions. That shows its property of Transform. Since
the tasks are implemented with DAG diagrams in Apache DolphinScheduler, we
wonder that is it possible to abstract [...]
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/30/ffd0c839647bcce4c208ee0cf5b7622b.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/30/ffd0c839647bcce4c208ee0cf5b7622b.png"/>
</div>
Each component is regarded as a module, and the dependency between the
functions of each module is dealt with the dependency of DS. The corresponding
component and the component transfer data are stored at the front-end, which
means the front-end performs the transfer and logical judgments between most of
the components after introducing input (input component) , since each component
can be seen as an output/output of Datax. Once all parameters are set, the
final output is determined. Th [...]
@@ -88,7 +88,7 @@ PS: Because our business scenarios may involve cross-database
queries (MySQL com
People dabble in the governance process know that a simple governance process
can lead to a quality report. We write part of the government records into ES,
and then use the aggregation capabilities of ES to obtain a quality report.
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/29/4da40632c21dbea51d2951d98ee18f1b.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/29/4da40632c21dbea51d2951d98ee18f1b.png"/>
</div>
The above are some practices that we have made based on DS and middlewares
like Datax, combining with businesses to meet our own needs.
diff --git a/blog/en-us/Introducing-Apache-DolphinScheduler-1.3.9.md
b/blog/en-us/Introducing-Apache-DolphinScheduler-1.3.9.md
index 5e0142d..f6912db 100644
--- a/blog/en-us/Introducing-Apache-DolphinScheduler-1.3.9.md
+++ b/blog/en-us/Introducing-Apache-DolphinScheduler-1.3.9.md
@@ -1,6 +1,6 @@
# Introducing Apache DolphinScheduler 1.3.9, StandaloneServer is Available!
-[](https://imgpp.com/image/OQFd4)
+[](https://imgpp.com/image/OQFd4)
On October 22, 2021, we are excited to announce the release of Apache
DolphinScheduler 1.3.9. After a month and a half,Apache DolphinScheduler 1.3.9
brings StandaloneServer to users with the joint efforts of the community.
StandaloneServer is a major update of this version, which means a huge leap in
ease of use, and the details will be introduced below. In addition, this
upgrade also fixes two critical bugs in 1.3.8.
@@ -50,14 +50,14 @@ Apache DolphinScheduler is a distributed and extensible
workflow scheduler platf
DolphinScheduler assembles Tasks in the form of DAG (Directed Acyclic Graph),
which can monitor the running status of tasks in real time. At the same time,
it supports operations such as retry, recovery from designated nodes, suspend
and Kill tasks, and focuses on the following 6 capabilities :
-<img src="https://imgpp.com/images/2021/10/25/WechatIMG89.md.jpg" width="60%"
/>
+<img src="https://s1.imgpp.com/2021/10/25/WechatIMG89.md.jpg" width="60%" />
## 2 Partial User Cases
According to incomplete statistics, as of October 2020, 600+ companies and
institutions have adopted DolphinScheduler in production environments. Partial
cases are shown as below (in no particular order).
-[](https://imgpp.com/image/OQylI)
+[](https://imgpp.com/image/OQylI)
## 3 Participate in Contribution
diff --git a/blog/en-us/Lizhi-case-study.md b/blog/en-us/Lizhi-case-study.md
index fe43a5d..0ede05b 100644
--- a/blog/en-us/Lizhi-case-study.md
+++ b/blog/en-us/Lizhi-case-study.md
@@ -7,7 +7,7 @@
## Background
<div align=center>
-<img src="https://imgpp.com/images/2021/11/23/radio-g360707f44_1920.md.jpg"/>
+<img src="https://s1.imgpp.com/2021/11/23/radio-g360707f44_1920.md.jpg"/>
</div>
Lizhi is a fast-growing UGC audio community company that attaches great
importance to AI and data analysis technology development. AI can find the
right voice for each user among the massively fragmented audios, and build it
into a sustainable ecological closed loop. And data analysis can guide the
company’s fast-growing business. Both of the two fields need to process massive
amounts of data and require a big data scheduling system.
@@ -54,7 +54,7 @@ At the technical level of the platform, Lizhi optimizes the
extended modules for
A simple xgboost case:
<div align=center>
-<img src="https://imgpp.com/images/2021/11/23/1.md.png"/>
+<img src="https://s1.imgpp.com/2021/11/23/1.png"/>
</div>
### 1. Obtaining training samples
@@ -62,7 +62,7 @@ A simple xgboost case:
At present, Lizhi does not directly select data from Hive, and joins the
union, splitting the sample afterward, but directly processes the sample by
shell nodes.
<div align=center>
-<img src="https://imgpp.com/images/2021/11/23/2.md.png"/>
+<img src="https://s1.imgpp.com/2021/11/23/2.png"/>
</div>
### 2. Data preprocessing
@@ -70,7 +70,7 @@ At present, Lizhi does not directly select data from Hive,
and joins the union,
Transformer& custom preprocessing configuration file, use the same
configuration for online training, and feature preprocessing is performed after
the feature is obtained. It contains the itemType and its feature set to be
predicted, the user’s userType and its feature set, as well as the associated
and crossed itemType and its feature set. Define the transformer function for
each feature preprocessing, supports custom transformer and hot update,
xgboost, and tf model feature preprocessi [...]
<div align=center>
-<img src="https://imgpp.com/images/2021/11/23/2.md.png"/>
+<img src="https://s1.imgpp.com/2021/11/23/2.png"/>
</div>
@@ -80,7 +80,7 @@ It supports w2v, xgboost, tf model training modules. The
training modules are fi
For example, in the xgboost training process, use Python to package the
xgboost training script into the xgboost training node of DolphinScheduler, and
show the parameters required for training on the interface. The file exported
by “training set data preprocessing” is input to the training node through HDFS.
<div align=center>
-<img src="https://imgpp.com/images/2021/11/23/3.md.png"/>
+<img src="https://s1.imgpp.com/2021/11/23/3.png"/>
</div>
### 4. Model release
@@ -89,7 +89,7 @@ For example, in the xgboost training process, use Python to
package the xgboost
The release model will send the model and preprocessing configuration files to
HDFS and insert records into the model release table. The model service will
automatically identify the new model, update the model, and provide online
prediction services to the external.
<div align=center>
-<img src="https://imgpp.com/images/2021/11/23/4.md.png"/>
+<img src="https://s1.imgpp.com/2021/11/23/4.png"/>
</div>
diff --git a/blog/en-us/Twos.md b/blog/en-us/Twos.md
index c6663ed..ce03129 100644
--- a/blog/en-us/Twos.md
+++ b/blog/en-us/Twos.md
@@ -6,7 +6,7 @@ description: Recently, TWOS officially announced the approval
of 6 full members
## Congratulations! Apache DolphinScheduler Has Been Approved As A TWOS
Candidate Member
<div align=center>
-<img src="https://imgpp.com/images/2022/01/10/1641804549068.md.png"/>
+<img src="https://s1.imgpp.com/2022/01/10/1641804549068.png"/>
</div>
Recently, TWOS officially announced the approval of 6 full members and 3
candidate members, Apache DolphinScheduler, a cloud-native distributed big data
scheduler, was listed by TWOS.
@@ -26,7 +26,7 @@ After being screened by TWOS evaluation criteria, Apache
DolphinScheduler was ap
On September 17, 2021, the first batch of members joined TWOS, including 25
full members such as openEuler, openGauss, MindSpore, openLookeng, etc., and 27
candidate members like Apache RocketMQ, Dcloud, Fluid, FastReID, etc., with a
total of 52 members:
<div align=center>
-<img src="https://imgpp.com/images/2022/01/10/1.md.png"/>
+<img src="https://s1.imgpp.com/2022/01/10/1.png"/>
</div>
Only two communities were selected for the second batch of candidate
members—Apache DolphinScheduler and PolarDB, an open-source cloud-native
ecological distributed database contributed by Alibaba Cloud.
diff --git a/blog/en-us/YouZan-case-study.md b/blog/en-us/YouZan-case-study.md
index 76dad89..dcc118f 100644
--- a/blog/en-us/YouZan-case-study.md
+++ b/blog/en-us/YouZan-case-study.md
@@ -1,7 +1,7 @@
# From Airflow to Apache DolphinScheduler, the Roadmap of Scheduling System On
Youzan Big Data Development Platform
<div align=center>
-<img src="https://imgpp.com/images/2021/12/16/1639383815755.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1639383815755.png"/>
</div>
At the recent Apache DolphinScheduler Meetup 2021, Zheqi Song, the Director of
Youzan Big Data Development Platform
@@ -25,7 +25,7 @@ capabilities for driving merchants' digital growth.
At present, Youzan has established a relatively complete digital product
matrix with the support of the data center:
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_Jjgx5qQfjo559_oaJP-DAQ.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_Jjgx5qQfjo559_oaJP-DAQ.png"/>
</div>
Youzan has established a big data development platform (hereinafter referred
to as DP platform) to support the
@@ -33,7 +33,7 @@ increasing demand for data processing services. This is a big
data offline devel
the environment, tools, and data needed for the big data tasks development.
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_G9znZGQ1XBhJva0tjWa6Bg.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_G9znZGQ1XBhJva0tjWa6Bg.png"/>
</div>
Youzan Big Data Development Platform Architecture
@@ -49,7 +49,7 @@ scheduling cluster.
### 1 Scheduling layer architecture design
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_UDNCmMrZtcswj62aqNXA1g.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_UDNCmMrZtcswj62aqNXA1g.png"/>
</div>
Youzan Big Data Development Platform Scheduling Layer Architecture Design
@@ -85,7 +85,7 @@ scheduling system also faces many challenges and problems.
3. Performance issues:
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_U33OWzzfw2Dqn3ryCNbSvw.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_U33OWzzfw2Dqn3ryCNbSvw.png"/>
</div>
Airflow's schedule loop, as shown in the figure above, is essentially the
loading and analysis of DAG and generates DAG
@@ -112,11 +112,11 @@ community ecology.
This is the comparative analysis result below:
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_Rbr05klPmQIc7WPFNeEH-w.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_Rbr05klPmQIc7WPFNeEH-w.png"/>
</div>
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_Ity1QoRL_Yu5aDVClY9AgA.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_Ity1QoRL_Yu5aDVClY9AgA.png"/>
</div>
Airflow VS DolphinScheduler
@@ -124,7 +124,7 @@ Airflow VS DolphinScheduler
### 1 DolphinScheduler valuation
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_o8c1Y1TFAOis3KozzJnvfA.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_o8c1Y1TFAOis3KozzJnvfA.png"/>
</div>
As shown in the figure above, after evaluating, we found that the throughput
performance of DolphinScheduler is twice
@@ -183,7 +183,7 @@ In response to the above three points, we have redesigned
the architecture.
release.
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_eusVhW4QAJ2uO-J96bqiFg.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_eusVhW4QAJ2uO-J96bqiFg.png"/>
</div>
Refactoring Design
@@ -197,7 +197,7 @@ complement it.
- Workflow definition status combing
<div align=center>
-<img src="https://imgpp.com/images/2021/12/16/-1.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/-1.png"/>
</div>
We first combed the definition status of the DolphinScheduler workflow. The
definition and timing management of
@@ -213,10 +213,10 @@ generated on the DolphinScheduler. After going online,
the task will be run and
to view the results and obtain log running information in real-time.
<div align=center>
-<img src="https://imgpp.com/images/2021/12/16/-1.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/-1.png"/>
</div>
<div align=center>
-<img src="https://imgpp.com/images/2021/12/16/-3.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/-3.png"/>
</div>
- Workflow release process transformation
@@ -224,10 +224,10 @@ Secondly, for the workflow online process, after
switching to DolphinScheduler,
workflow definition configuration and timing configuration, as well as the
online status.
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_4-ikFp_jJ44-YWJcGNioOg.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_4-ikFp_jJ44-YWJcGNioOg.png"/>
</div>
<div align=center>
-<img src="https://imgpp.com/images/2021/12/16/-5.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/-5.png"/>
</div>
The original data maintenance and configuration synchronization of the
workflow is managed based on the DP master, and
only when the task is online and running will it interact with the scheduling
system. Based on these two core changes,
@@ -249,7 +249,7 @@ the DP master, map the task information maintained by the
DP to the task on DP,
DolphinScheduler to transfer task configuration information.
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_A76iOa5LKyPiu-NoopmYrA.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_A76iOa5LKyPiu-NoopmYrA.png"/>
</div>
Because some of the task types are already supported by DolphinScheduler, it
is only necessary to customize the
@@ -264,7 +264,7 @@ transformation focuses on these task types. At present, the
adaptation and trans
tasks, and script tasks adaptation have been completed.
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_y7HUfYyLs9NxnTzENKGSCA.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_y7HUfYyLs9NxnTzENKGSCA.png"/>
</div>
### 4 Function complement
@@ -283,7 +283,7 @@ In Figure 1, the workflow is called up on time at 6 o'clock
and tuned up once an
called up on time at 6 o'clock and the task execution is completed. The
current state is also normal.
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_MvQGZ-FKKLMvKrlWihXHgg.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_MvQGZ-FKKLMvKrlWihXHgg.png"/>
</div>
figure 1
@@ -292,7 +292,7 @@ Figure 2 shows that the scheduling system was abnormal at 8
o'clock, causing the
o'clock and 8 o'clock.
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_1WxLOtd1Oh2YERmtGcRb0Q.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_1WxLOtd1Oh2YERmtGcRb0Q.png"/>
</div>
figure 2
@@ -300,7 +300,7 @@ Figure 3 shows that when the scheduling is resumed at 9
o'clock, thanks to the C
can automatically replenish the previously lost execution plan to realize the
automatic replenishment of the scheduling.
<div align=center>
-<img src="https://imgpp.com/images/2021/12/16/126ec1039f7aa614c.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/126ec1039f7aa614c.png"/>
</div>
Figure 3
@@ -314,7 +314,7 @@ At the same time, this mechanism is also applied to DP's
global complement.
- Global Complement across Dags
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_eVyyABTQCLeSGzbbuizfDA.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_eVyyABTQCLeSGzbbuizfDA.png"/>
</div>
DP platform cross-Dag global complement process
@@ -347,7 +347,7 @@ environment and the formal environment.
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_bXwtKI2HJzQuHCMW5y3hgg.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_bXwtKI2HJzQuHCMW5y3hgg.png"/>
</div>
DolphinScheduler 2.0 workflow task node display
@@ -364,7 +364,7 @@ stress will be carried out in the test environment. If no
problems occur, we wil
production environment in January 2022, and plan to complete the full
migration in March.
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_jv3ScivmLop7GYjKIECaiw.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_jv3ScivmLop7GYjKIECaiw.png"/>
</div>
### 3 Expectations for DolphinScheduler
@@ -374,7 +374,7 @@ plug-in alarm components based on DolphinScheduler 2.0, by
which the Form inform
displayed adaptively on the frontend.
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/16/1_3jP2KQDtFy71ciDoUyW3eg.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/16/1_3jP2KQDtFy71ciDoUyW3eg.png"/>
</div>
"
diff --git a/blog/zh-cn/Apache-DolphinScheduler-2.0.1.md
b/blog/zh-cn/Apache-DolphinScheduler-2.0.1.md
index d437cd1..d56dad8 100644
--- a/blog/zh-cn/Apache-DolphinScheduler-2.0.1.md
+++ b/blog/zh-cn/Apache-DolphinScheduler-2.0.1.md
@@ -1,7 +1,7 @@
# Apache DolphinScheduler 2.0.1 来了,备受期待的一键升级、插件化终于实现!
<div align=center>
-<img src="https://imgpp.com/images/2021/12/17/1639647220322.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/17/1639647220322.png"/>
</div>
> 编者按:好消息!Apache DolphinScheduler 2.0.1 版本今日正式发布!
@@ -14,7 +14,7 @@
Apache DolphinScheduler 2.0.1 的工作流执行流程活动如下图所示:
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/82a493951882982a22823a08ab8718e7.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/82a493951882982a22823a08ab8718e7.png"/>
</div>
启动流程活动图
@@ -74,7 +74,7 @@ org.apache.dolphinscheduler.spi.params 里对插件的参数做了封装,它
下图为 API 模块下工作流和任务的操作流程图:
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/27405914b6eced124394f2079676633c.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/27405914b6eced124394f2079676633c.png"/>
</div>
@@ -114,14 +114,14 @@ StandAloneServer 是为了让用户快速体验产品而创建的服务,其中
在前一个"create_parameter"任务中设置一个out的变量”trans“: echo '${setValue(trans=hello
trans)}'
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/323f6a18d8a1d2f2d8fdcb5687c264b5.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/323f6a18d8a1d2f2d8fdcb5687c264b5.png"/>
</div>
当前置任务中的任务日志中检测到关键字:”${setValue(key=value)}“,
系统会自动解析变量传递值,在后置任务中,可以直接使用”trans“变量:
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/8be29339b73b594dc05a6b832d9330ec.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/8be29339b73b594dc05a6b832d9330ec.png"/>
</div>
- SQL 任务的参数传递:
@@ -129,13 +129,13 @@ StandAloneServer 是为了让用户快速体验产品而创建的服务,其中
SQL 任务的自定义变量 prop 的名字需要和字段名称一致,变量会选择 SQL 查询结果中的列名中与该变量名称相同的列对应的值。输出用户数量:
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/85bc5216c01ca958cdf11d4bd555c8a6.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/85bc5216c01ca958cdf11d4bd555c8a6.png"/>
</div>
在下游任务中使用变量”cnt“:
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/4278d0b7f833b64f24fc3d6122287454.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/4278d0b7f833b64f24fc3d6122287454.png"/>
</div>
新增 switch 任务和 pigeon 任务组件:
@@ -151,7 +151,7 @@ SQL 任务的自定义变量 prop 的名字需要和字段名称一致,变量
配置当全局变量 id=1 时,运行任务 C。则在任务 B 的条件中编辑 ${id} == 1,分支流转选择 C。对于其他任务,在分支流转中选择 D。
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/636f53ddc809f028ffdfc18fd08b5828.md.jpg"/>
+<img
src="https://s1.imgpp.com/2021/12/17/636f53ddc809f028ffdfc18fd08b5828.md.jpg"/>
</div>
@@ -168,7 +168,7 @@ pigeon 任务,是一个可以和第三方系统对接的一种任务组件,
在线配置 Worker 运行环境,一个 Worker 可以指定多个环境,每个环境等价于 dolphinscheduler_env.sh 文件。
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/ef8b444c6dbebe397daaaa3bbadf743f.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/ef8b444c6dbebe397daaaa3bbadf743f.png"/>
</div>
在创建任务的时候,选择 worker 分组和对应的环境变量,任务在执行时,worker 会在对应的执行环境中执行任务。
@@ -234,12 +234,12 @@ pigeon 任务,是一个可以和第三方系统对接的一种任务组件,
感谢 289 位参与 2.0.1 版本优化和改进的社区贡献者(排名不分先后)!
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/2020b4f57e33734414a11149704ded92.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/2020b4f57e33734414a11149704ded92.png"/>
</div>
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/1825b6945d5845233b7389479ba6c074.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/1825b6945d5845233b7389479ba6c074.png"/>
</div>
## 8 加入社区
@@ -249,7 +249,7 @@ pigeon 任务,是一个可以和第三方系统对接的一种任务组件,
参与 DolphinScheduler 社区有非常多的参与贡献的方式,包括:
<div align=center>
-<img
src="https://imgpp.com/images/2021/12/17/bca55edc877ed6136703a6251e3a19f9.md.png"/>
+<img
src="https://s1.imgpp.com/2021/12/17/bca55edc877ed6136703a6251e3a19f9.png"/>
</div>
贡献第一个PR(文档、代码) 我们也希望是简单的,第一个PR用于熟悉提交的流程和社区协作以及感受社区的友好度。
diff --git a/blog/zh-cn/Awarded_most_popular_project_in_2021.md
b/blog/zh-cn/Awarded_most_popular_project_in_2021.md
index cb09c1d..399768a 100644
--- a/blog/zh-cn/Awarded_most_popular_project_in_2021.md
+++ b/blog/zh-cn/Awarded_most_popular_project_in_2021.md
@@ -1,6 +1,6 @@
# Apache DolphinScheduler 获评 2021 年度「最受欢迎项目」!
<div align=center>
-<img
src="https://imgpp.com/images/2022/01/07/_c449bb07189725ea562d5ba404504b8f_96119.md.jpg"/>
+<img
src="https://s1.imgpp.com/2022/01/07/_c449bb07189725ea562d5ba404504b8f_96119.md.jpg"/>
</div>
> 近日,由 OSCHINA 举办的「2021 OSC 中国开源项目」评选活动公布了评选结果。
@@ -9,11 +9,11 @@
## 获评「最受欢迎项目」
<div align=center>
-<img src="https://imgpp.com/images/2022/01/07/1.md.png"/>
+<img src="https://s1.imgpp.com/2022/01/07/1.png"/>
</div>
<div align=center>
-<img src="https://imgpp.com/images/2022/01/07/2.md.png"/>
+<img src="https://s1.imgpp.com/2022/01/07/2.png"/>
</div>
@@ -23,7 +23,7 @@
之后,经过第二轮投票的激烈角逐,Apache DolphinScheduler 再次胜出,获得「最受欢迎项目」奖项。
<div align=center>
-<img src="https://imgpp.com/images/2022/01/07/3-1.md.png"/>
+<img src="https://s1.imgpp.com/2022/01/07/3-1.png"/>
</div>
中国开源软件生态蓬勃发展,近年来涌现出了一大批优秀的开源软件创企,他们不忘初心,深耕开源,回馈社区,为中国开源软件事业添砖加瓦,成为全球开源软件生态中不可忽视的重要力量。「OSC
中国开源项目评选」是开源中国(OSCHINA,OSC
开源社区)举办的国内最权威、最盛大的开源项目评选活动,旨在更好地展示国内开源现状,探讨国内开源趋势,激励国内开源人才,促进国内开源生态完善。
diff --git a/blog/zh-cn/DS-2.0-alpha-release.md
b/blog/zh-cn/DS-2.0-alpha-release.md
index 31ae1d2..281a633 100644
--- a/blog/zh-cn/DS-2.0-alpha-release.md
+++ b/blog/zh-cn/DS-2.0-alpha-release.md
@@ -1,6 +1,6 @@
# 重构、插件化、性能提升 20 倍,Apache DolphinScheduler 2.0 alpha 发布亮点太多!
-<div align='center'><img
src="https://imgpp.com/images/2021/11/16/a920be6733a3d99af38d1cdebfcbb3ff.md.png"></div>
+<div align='center'><img
src="https://s1.imgpp.com/2021/11/16/a920be6733a3d99af38d1cdebfcbb3ff.md.png"></div>
社区的小伙伴们,好消息!经过 100 多位社区贡献者近 10 个月的共同努力,我们很高兴地宣布 Apache DolphinScheduler 2.0
alpha 发布。这是 DolphinScheduler 自进入 Apache 以来的首个大版本,进行了多项关键更新和优化,是
DolphinScheduler 发展中的里程碑。
@@ -28,9 +28,9 @@ DolphinScheduler 2.0 alpha 主要重构了 Master 的实现,大幅优化了元
## 优化 UI 组件,全新的 UI 界面
-<div align='center'><img
src="https://imgpp.com/images/2021/11/16/4e4024cbddbe3113f730c5e67f083c4f.md.png"></div>
+<div align='center'><img
src="https://s1.imgpp.com/2021/11/16/4e4024cbddbe3113f730c5e67f083c4f.md.png"></div>
-<div align='center'><img
src="https://imgpp.com/images/2021/11/16/75e002b21d827aee9aeaa3922c20c13f.md.png"></div>
+<div align='center'><img
src="https://s1.imgpp.com/2021/11/16/75e002b21d827aee9aeaa3922c20c13f.md.png"></div>
<center>
@@ -53,17 +53,17 @@ DolphinScheduler 2.0 alpha 主要重构了 Master 的实现,大幅优化了元
## 新功能列表
-<div align='center'><img
src="https://imgpp.com/images/2021/11/16/WX20211116-164031.md.png"></div>
+<div align='center'><img
src="https://s1.imgpp.com/2021/11/16/WX20211116-164031.md.png"></div>
## 优化项
-<div align='center'><img
src="https://imgpp.com/images/2021/11/16/WX20211116-164042.md.png"></div>
+<div align='center'><img
src="https://s1.imgpp.com/2021/11/16/WX20211116-164042.md.png"></div>
## Bug 修复
-<div align='center'><img
src="https://imgpp.com/images/2021/11/16/WX20211116-164059.md.png"></div>
+<div align='center'><img
src="https://s1.imgpp.com/2021/11/16/WX20211116-164059.md.png"></div>
@@ -75,7 +75,7 @@ DolphinScheduler 2.0 alpha 的发布凝聚了众多社区贡献者的智慧和
非常感谢 100+ 位(GitHub ID)社区小伙伴的贡献,期待更多人能够加入 DolphinScheduler 社
区共建,为打造一个更好用的大数据工作流调度平台贡献自己的力量!
-<div align='center'><img
src="https://imgpp.com/images/2021/11/16/8926d45ead1f735e8cfca0e8142b315f.md.png"></div>
+<div align='center'><img
src="https://s1.imgpp.com/2021/11/16/8926d45ead1f735e8cfca0e8142b315f.md.png"></div>
<center>2.0 alpha 贡献者名单</center>
diff --git a/blog/zh-cn/Eavy_Info.md b/blog/zh-cn/Eavy_Info.md
index a0ac54c..fdd645b 100644
--- a/blog/zh-cn/Eavy_Info.md
+++ b/blog/zh-cn/Eavy_Info.md
@@ -1,6 +1,6 @@
# 亿云基于 DolphinScheduler 构建资产数据管理平台服务,助力政务信息化生态建设 | 最佳实践
<div align=center>
-<img src="https://imgpp.com/images/2021/12/30/1639640547411.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/30/1639640547411.png"/>
</div>
作者| 孙浩
@@ -16,7 +16,7 @@
DolphinScheduler 是一个分布式去中心化,易扩展的可视化 DAG 调度系统,支持包括 Shell、Python、Spark、Flink
等多种类型的 Task 任务,并具有很好的扩展性。其整体架构如下图所示:
<div align=center>
-<img src="https://imgpp.com/images/2021/12/28/1.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/28/1.png"/>
</div>
典型的 master-slave 架构,横向扩展能力强,调度引擎是 Quartz,本身作为 Spring Boot 的 java 开源项目,对于熟悉
Spring Boot 开发的人,集成使用更加的简单上手。
@@ -46,30 +46,30 @@ DS 作为调度系统支持以下功能:
<div align=center>
-<img src="https://imgpp.com/images/2021/12/30/1.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/30/1.png"/>
</div>
同步任务分为了周期任务和一次性任务,在配置完成输入输出源的配置任务之后,周期任务的话,需要配置 corn 表达式,然后调用保存接口,将同步任务发送给DS
的调度平台。
<div align=center>
-<img src="https://imgpp.com/images/2021/12/30/2.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/30/2.png"/>
</div>
我们这里综合考虑放弃了之前 DS 的 UI 前端(第二部分在自助开发模块会给大家解释),复用 DS 后端的上线、启停、删除、日志查看等接口。
<div align=center>
-<img src="https://imgpp.com/images/2021/12/30/4.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/30/4.png"/>
</div>
<div align=center>
-<img src="https://imgpp.com/images/2021/12/30/5.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/30/5.png"/>
</div>
整个同步模块的设计思路,就是重复利用 datax 组件的输入输出 plugin 多样性,配合 DS
的优化,来实现一个离线的同步任务,这个是当前我们的同步的一个组件图,实时同步这块不再赘述。
<div align=center>
-<img src="https://imgpp.com/images/2021/12/30/9.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/30/9.png"/>
</div>
## 03 基于DS的自助开发实践
@@ -77,7 +77,7 @@ DS 作为调度系统支持以下功能:
熟悉 datax的人都知道它本质上是一个 ETL 工具,而其 Transform 的属性体现在,它提供了一个支持 grovy 语法的 transformer
模块,同时可以在 datax 源码中进一步丰富 transformer 中用到工具类,例如替换、正则匹配、筛选、脱敏、统计等功能。而
Dolphinscheduler 的任务,是可以用 DAG 图来实现,那么我们想到,是否存在一种可能,针对一张表或者几张表,把每个 datax 或者 SQL
抽象成一个数据治理的小模块,每个模块按照 DAG 图去设计,并且在上下游之间可以实现数据的传递,最好还是和 DS 一样的可以拖拽式的实现。于是,我们基于前期对
datax 与 ds 的使用,实现了一个自助开发的模块。
<div align=center>
-<img src="https://imgpp.com/images/2021/12/30/6.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/30/6.png"/>
</div>
每个组件可能是一个模块,每个模块功能之间的依赖关系,我们利用 ds 的depend 来处理,而对应组件与组件传递数据,我们利用前端去存储,也就是我们在引入
input(输入组件)之后,让前端来进行大部分组件间的传递和逻辑判断,因为每个组件都可以看作一个 datax
的(输出/输出),所有参数在输入时,最终输出的全集基本就确定了,这也是我们放弃 DS 的 UI 前端的原因。之后,我们将这个 DAG 图组装成 DS
的定义的类型,同样交付给 ds 任务中心。
@@ -88,15 +88,15 @@ PS:因为我们的业务场景可能存在跨数据库查询的情况(不同
熟悉治理流程的人都知道,如果能够做到简单的治理流程化,那么必然可以产出一份质量报告。我们在自助开发的基础上进行优化,将部分治理的记录写入 ES 中,再利用
ES 的聚合能力来实现了一个质量报告。
<div align=center>
-<img src="https://imgpp.com/images/2021/12/30/7.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/30/7.png"/>
</div>
<div align=center>
-<img src="https://imgpp.com/images/2021/12/30/8.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/30/8.png"/>
</div>
<div align=center>
-<img src="https://imgpp.com/images/2021/12/30/10.md.png"/>
+<img src="https://s1.imgpp.com/2021/12/30/10.png"/>
</div>
以上便是我们使用 DS 结合 datax 等中间件,并结合业务背景所做的一些符合自身需求的实践。
diff --git a/blog/zh-cn/Lizhi-case-study.md b/blog/zh-cn/Lizhi-case-study.md
index 6fc7e44..ea716e2 100644
--- a/blog/zh-cn/Lizhi-case-study.md
+++ b/blog/zh-cn/Lizhi-case-study.md
@@ -3,7 +3,7 @@
<div align=center>
-<img src="https://imgpp.com/images/2021/11/23/1637566412753.md.png"/>
+<img src="https://s1.imgpp.com/2021/11/23/1637566412753.png"/>
</div>
@@ -15,7 +15,7 @@
## 01 背景
<div align=center>
-<img src="https://imgpp.com/images/2021/11/23/radio-g360707f44_1920.md.jpg"/>
+<img src="https://s1.imgpp.com/2021/11/23/radio-g360707f44_1920.md.jpg"/>
</div>
@@ -70,7 +70,7 @@
一个简单的xgboost案例:
<div align=center>
-<img src="https://imgpp.com/images/2021/11/23/1.md.png"/>
+<img src="https://s1.imgpp.com/2021/11/23/1.png"/>
</div>
@@ -83,7 +83,7 @@
Transformer&自定义预处理配置文件,训练和线上采用同一份配置,获取特征后进行特征预处理。里面包含了要预测的 itemType 及其特征集,用户
userType 及其特征集,关联和交叉的 itemType 及其特征集。定义每个特征预处理的 transformer 函数,支持自定义transformer
和热更新,xgboost 和 tf
模型的特征预处理。经过此节点后,才是模型训练真正要的数据格式。这个配置文件在模型发布时也会带上,以便保持训练和线上预测是一致的。这个文件维护在
DolphinScheduler 的资源中心。
<div align=center>
-<img src="https://imgpp.com/images/2021/11/23/2.md.png"/>
+<img src="https://s1.imgpp.com/2021/11/23/2.png"/>
</div>
### 3. Xgboost 训练
@@ -93,7 +93,7 @@ Transformer&自定义预处理配置文件,训练和线上采用同一份配
例如,xgboost 训练过程中,使用 Python封装好 xgboost训练脚本,包装成 DolphinScheduler 的 xgboost
训练节点,在界面上暴露训练所需参数。经过“训练集数据预处理”输出的文件,经过 hdfs 输入到训练节点。
<div align=center>
-<img src="https://imgpp.com/images/2021/11/23/3.md.png"/>
+<img src="https://s1.imgpp.com/2021/11/23/3.png"/>
</div>
@@ -102,7 +102,7 @@ Transformer&自定义预处理配置文件,训练和线上采用同一份配
发布模型会把模型和预处理配置文件发到 HDFS,同时向模型发布表插入记录,模型服务会自动识别新模型,进而更新模型,对外提供在线预测服务。
<div align=center>
-<img src="https://imgpp.com/images/2021/11/23/4.md.png"/>
+<img src="https://s1.imgpp.com/2021/11/23/4.png"/>
</div>
diff --git a/blog/zh-cn/Twos.md b/blog/zh-cn/Twos.md
index efc21da..f91a9dd 100644
--- a/blog/zh-cn/Twos.md
+++ b/blog/zh-cn/Twos.md
@@ -1,13 +1,13 @@
## 恭喜 Apache DolphinScheduler 入选可信开源社区共同体(TWOS)预备成员!
<div align=center>
-<img
src="https://imgpp.com/images/2022/01/10/164179986575876800e959cabce5b.md.png"/>
+<img src="https://s1.imgpp.com/2022/01/10/164179986575876800e959cabce5b.png"/>
</div>
近日,可信开源社区共同体正式宣布批准 6 位正式成员和 3 位预备成员加入。其中,云原生分布式大数据调度系统 Apache DolphinScheduler
入选,成为可信开源社区共同体预备成员。
<div align=center>
-<img
src="https://imgpp.com/images/2022/01/10/559c8a77c69a15c7e91423b7306b2e37.md.png"/>
+<img
src="https://s1.imgpp.com/2022/01/10/559c8a77c69a15c7e91423b7306b2e37.png"/>
</div>
Apache DolphinScheduler
是一个分布式易扩展的新一代工作流调度平台,致力于“解决大数据任务之间错综复杂的依赖关系,使整个数据处理过程直观可见”,其强大的可视化 DAG
界面极大地提升了用户体验,配置工作流程无需复杂代码。
@@ -19,7 +19,7 @@ Apache DolphinScheduler 是一个分布式易扩展的新一代工作流调度
在“2021OSCAR
开源产业大会”上,中国信通院正式成立可信开源社区共同体(TWOS)。可信开源社区共同体由众多开源项目和开源社区组成,目的是引导建立健康可信且可持续发展的开源社区,旨在搭建交流平台,提供全套的开源风险监测与生态监测服务。
<div align=center>
-<img
src="https://imgpp.com/images/2022/01/10/fae6b468615d4845f2d09d99061e0b6b.md.png"/>
+<img
src="https://s1.imgpp.com/2022/01/10/fae6b468615d4845f2d09d99061e0b6b.png"/>
</div>
为帮助企业降低开源软件的使用风险,推动建立可信开源生态,中国信通院建立了可信开源标准体系,对企业开源治理能力、开源项目合规性、开源社区成熟度、开源工具检测能力、商业产品开源风险管理能力开展测评。其中,对于开源社区的评估,是开源社区=人+项目+基础设施平台,一个好的开源社区有助于开源项目营造良好的开源生态并扩大影响力。可信开源社区评估从基础设施、社区治理、社区运营与社区开发等角度,梳理开源社区应关注的内容及指标,聚焦于如何构建活跃的开发者生态与可信的开源社区。
@@ -27,7 +27,7 @@ Apache DolphinScheduler 是一个分布式易扩展的新一代工作流调度
经过可信开源社区共同体(TWOS)的重重评估标准的筛选,批准 Apache DolphinScheduler 入选预备成员,证明了其对 Apache
DolphinScheduler 的开源运营方式、成熟度及贡献的认可,激励社区提升活跃度。
<div align=center>
-<img
src="https://imgpp.com/images/2022/01/10/86ad880d0f0b069ae3ee5bd3f7265e4a.md.jpg"/>
+<img
src="https://s1.imgpp.com/2022/01/10/86ad880d0f0b069ae3ee5bd3f7265e4a.md.jpg"/>
</div>
@@ -36,7 +36,7 @@ Apache DolphinScheduler 是一个分布式易扩展的新一代工作流调度
2021 年 9 月 17 日,可信开源社区共同体(TWOS)第一批成员加入。目前,可信开源社区共同体包括
openEuler、openGauss、MindSpore、openLookeng 等在内的 25 名正式成员,以及 Apache
RocketMQ、Dcloud、Fluid、FastReID 等在内的 27 名预备成员,总数为 52 名:
<div align=center>
-<img
src="https://imgpp.com/images/2022/01/10/9559811cd259449f03e32565a920c0cf.md.png"/>
+<img
src="https://s1.imgpp.com/2022/01/10/9559811cd259449f03e32565a920c0cf.png"/>
</div>
第二批预备成员仅有两个项目——Apache DolphinScheduler 与阿里云开源云原生态分布式数据库 PolarDB 入选。
diff --git a/blog/zh-cn/YouZan-case-study.md b/blog/zh-cn/YouZan-case-study.md
index fbdf6ea..16546c2 100644
--- a/blog/zh-cn/YouZan-case-study.md
+++ b/blog/zh-cn/YouZan-case-study.md
@@ -13,11 +13,11 @@
目前,有赞在数据中台的支撑下已经建立了比较完整的数字产品矩阵:
-[](https://imgpp.com/image/ZbGED)
+[](https://imgpp.com/image/ZbGED)
为了支持日益增长的数据处理业务需求,有赞建立了大数据开发平台(以下简称 DP
平台)。这是一个大数据离线开发平台,提供用户大数据任务开发所需的环境、工具和数据。
-[](https://imgpp.com/image/ZbZbN)
+[](https://imgpp.com/image/ZbZbN)
有赞大数据开发平台架构
@@ -26,7 +26,7 @@
### 1 调度层架构设计
-[](https://imgpp.com/image/ZbiQL)
+[](https://imgpp.com/image/ZbiQL)
有赞大数据开发平台调度层架构设计
@@ -50,7 +50,7 @@
2. Python 技术栈,维护迭代成本高;
3. 性能问题:
-[](https://imgpp.com/image/Zb8yu)
+[](https://imgpp.com/image/Zb8yu)
Airflow 的 schedule loop 如上图所示,本质上是对 DAG 的加载解析,将其生成 DAG round 实例执行任务调度。Airflow
2.0 之前的版本是单点 DAG 扫描解析到数据库,这就导致业务增长 Dag 数量较多时,scheduler loop 扫一次 Dag folder
会存在较大延迟(超过扫描频率),甚至扫描时间需要 60-70 秒,严重影响调度性能。
@@ -64,14 +64,14 @@ Airflow Scheduler Failover Controller 本质还是一个主从模式,standby
以下为对比分析结果:
-[](https://imgpp.com/image/ZbaBs)
+[](https://imgpp.com/image/ZbaBs)
Airflow VS DolphinScheduler
### 1 DolphinScheduler 价值评估
-[](https://imgpp.com/image/ZbVrt)
+[](https://imgpp.com/image/ZbVrt)
如上图所示,经过对 DolphinScheduler 价值评估,我们发现其在相同的条件下,吞吐性能是原来的调度系统的 2 倍,而 2.0 版本后
DolphinScheduler 的性能还会有更大幅度的提升,这一点让我们非常兴奋。
@@ -106,7 +106,7 @@ Airflow VS DolphinScheduler
3. 任务生命周期管理/调度管理等操作通过 DolphinScheduler API 交互;
利用 Project 机制冗余工作流配置,实现测试、发布的配置隔离。
-[](https://imgpp.com/image/Zbsda)
+[](https://imgpp.com/image/Zbsda)
改造方案设计
@@ -115,7 +115,7 @@ Airflow VS DolphinScheduler
- 工作流定义状态梳理
-[](https://imgpp.com/image/Zbvzd)
+[](https://imgpp.com/image/Zbvzd)
我们首先梳理了 DolphinScheduler 工作流的定义状态。因为 DolphinScheduler 工作的定义和定时管理会区分为上下线状态, 但
DP平台上两者的状态是统一的,因此在任务测试和工作流发布流程中,需要对 DP到DolphinScheduler 的流程串联做相应的改造。
@@ -123,13 +123,13 @@ Airflow VS DolphinScheduler
首先是任务测试流程改造。在切换到 DolphinScheduler 之后,所有的交互都是基于DolphinScheduler API 来进行的,当在 DP
启动任务测试时,会在 DolphinScheduler 侧生成对应的工作流定义配置,上线之后运行任务,同时调用 DolphinScheduler
的日志查看结果,实时获取日志运行信息。
-[](https://imgpp.com/image/Zb6Q0)
+[](https://imgpp.com/image/Zb6Q0)
- 工作流发布流程改造
其次,针对工作流上线流程,切换到 DolphinScheduler 之后,主要是对工作流定义配置和定时配置,以及上线状态进行了同步。
-[](https://imgpp.com/image/Zbx2b)
+[](https://imgpp.com/image/Zbx2b)
通过这两个核心流程的改造。工作流的原数据维护和配置同步其实都是基于 DP master来管理,只有在上线和任务运行时才会到调度系统进行交互,基于这点,DP
平台实现了工作流维度下的系统动态切换,以便于后续的线上灰度测试。
@@ -141,7 +141,7 @@ Airflow VS DolphinScheduler
目前,DolphinScheduler 平台已支持的任务类型主要包含数据同步类和数据计算类任务,如Hive SQL 任务、DataX 任务、Spark
任务等。因为任务的原数据信息是在 DP 侧维护的,因此 DP 平台的对接方案是在 DP 的 master 构建任务配置映射模块,将 DP 维护的 task
信息映射到 DP 侧的 task,然后通过 DolphinScheduler 的 API 调用来实现任务配置信息传递。
-[](https://imgpp.com/image/Z8fNH)
+[](https://imgpp.com/image/Z8fNH)
因为 DolphinScheduler 已经支持部分任务类型 ,所以只需要基于 DP 平台目前的实际使用场景对 DolphinScheduler
相应任务模块进行定制化改造。而对于 DolphinScheduler 未支持的任务类型,如Kylin任务、算法训练任务、DataY任务等,DP
平台也计划后续通过 DolphinScheduler 2.0 的插件化能力来补齐。
@@ -149,7 +149,7 @@ Airflow VS DolphinScheduler
因为 DP 平台上 SQL 任务和同步任务占据了任务总量的 80% 左右,因此改造重点都集中在这几个任务类型上,目前已基本完成 Hive SQL
任务、DataX 任务以及脚本任务的适配改造以及迁移工作。
-[](https://imgpp.com/image/Z8dgm)
+[](https://imgpp.com/image/Z8dgm)
### 4 功能补齐
@@ -164,17 +164,17 @@ Catchup 机制在 DP 的使用场景,是在调度系统异常或资源不足
在图 1 中,工作流在 6 点准时调起,每小时调一次,可以看到在 6 点任务准时调起并完成任务执行,当前状态也是正常调度状态。
-[](https://imgpp.com/image/Z8IkI)
+[](https://imgpp.com/image/Z8IkI)
图1
图 2 显示在 6 点完成调度后,一直到 8 点期间,调度系统出现异常,导致 7 点和 8点该工作流未被调起。
-[](https://imgpp.com/image/Z8X64)
+[](https://imgpp.com/image/Z8X64)
图2
图 3 表示当 9 点恢复调度之后,因为 具有 Catchup 机制,调度系统会自动回补之前丢失的执行计划,实现调度的自动回补。
-[](https://imgpp.com/image/Z8tq8)
+[](https://imgpp.com/image/Z8tq8)
图3
此机制在任务量较大时作用尤为显著,当 Schedule
节点异常或核心任务堆积导致工作流错过调度出发时间时,因为系统本身的容错机制可以支持自动回补调度任务,所以无需人工手动补数重跑。
@@ -183,7 +183,7 @@ Catchup 机制在 DP 的使用场景,是在调度系统异常或资源不足
- 跨 Dag 全局补数
-[](https://imgpp.com/image/Z8BtU)
+[](https://imgpp.com/image/Z8BtU)
DP 平台跨 Dag 全局补数流程
全局补数在有赞的主要使用场景,是用在核心上游表产出中出现异常,导致下游商家展示数据异常时。这种情况下,一般都需要系统能够快速重跑整个数据链路下的所有任务实例。
@@ -203,12 +203,12 @@ DP 平台目前已经在测试环境中部署了部分 DolphinScheduler 服务
对接到 DolphinScheduler API 系统后,DP 平台在用户层面统一使用 admin 用户,因为其用户体系是直接在 DP master
上进行维护,所有的工作流信息会区分测试环境和正式环境。
-[](https://imgpp.com/image/Zb7rA)
+[](https://imgpp.com/image/Zb7rA)
DolphinScheduler 工作流定义列表
-[](https://imgpp.com/image/ZbQlk)
+[](https://imgpp.com/image/ZbQlk)
-[](https://imgpp.com/image/ZbodC)
+[](https://imgpp.com/image/ZbodC)
DolphinScheduler 2.0工作流任务节点展示
DolphinScheduler 2.0 整体的 UI 交互看起来更加简洁,可视化程度更高,我们计划直接升级至 2.0 版本。
@@ -217,7 +217,7 @@ DolphinScheduler 2.0 整体的 UI 交互看起来更加简洁,可视化程度
目前 ,DP 平台还处于接入 DolphinScheduler 的灰度测试阶段,计划于今年 12
月进行工作流的全量迁移,同时会在测试环境进行分阶段全方位测试或调度性能测试和压力测试。确定没有任何问题后,我们会在来年 1 月进行生产环境灰度测试,并计划在
3 月完成全量迁移。
-[](https://imgpp.com/image/Zb0z6)
+[](https://imgpp.com/image/Zb0z6)
### 3 对 DolphinScheduler 的期待
@@ -225,7 +225,7 @@ DolphinScheduler 2.0 整体的 UI 交互看起来更加简洁,可视化程度
未来,我们对 DolphinScheduler 最大的期待是希望 2.0 版本可以实现任务插件化。
-[](https://imgpp.com/image/Z8Oae)
+[](https://imgpp.com/image/Z8Oae)
目前,DP 平台已经基于 DolphinScheduler 2.0实现了告警组件插件化,可在后端定义表单信息,并在前端自适应展示。
diff --git a/site_config/home.jsx b/site_config/home.jsx
index ee82e40..4ed760c 100644
--- a/site_config/home.jsx
+++ b/site_config/home.jsx
@@ -69,7 +69,7 @@ export default {
link: '/zh-cn/blog/Apache_dolphinScheduler_2.0.2.html',
},
{
- img:
'https://imgpp.com/images/2022/01/10/164179986575876800e959cabce5b.md.png',
+ img:
'https://s1.imgpp.com/2022/01/10/164179986575876800e959cabce5b.png',
title: '恭喜 Apache DolphinScheduler 入选可信开源社区共同体(TWOS)预备成员!',
content: '近日,可信开源社区共同体正式宣布批准 6 位正式成员和 3 位预备成员加入。其中...',
dateStr: '2022-1-11',
@@ -556,7 +556,7 @@ export default {
link: '/en-us/blog/Apache_dolphinScheduler_2.0.2.html',
},
{
- img: 'https://imgpp.com/images/2022/01/10/1641804549068.md.png',
+ img: 'https://s1.imgpp.com/2022/01/10/1641804549068.png',
title: 'Congratulations! Apache DolphinScheduler Has Been Approved
As A TWOS Candidate Member',
content: 'ecently, TWOS officially announced the approval of 6 full
members and 3 candidate...',
dateStr: '2022-1-11',