[dolphinscheduler-website] branch asf-site updated: Automated deployment: ec04ffd7523f34c87a9a9b7b03730713f32efff1

github-bot Thu, 16 Dec 2021 18:47:01 -0800

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler-website.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new fc7f364  Automated deployment: ec04ffd7523f34c87a9a9b7b03730713f32efff1
fc7f364 is described below

commit fc7f3642a22686a56e64dbf3c11b6ba9928d575e
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Fri Dec 17 02:45:09 2021 +0000

    Automated deployment: ec04ffd7523f34c87a9a9b7b03730713f32efff1
---
 en-us/blog/YouZan-case-study.html | 321 +++++++++++++++++++++++++++-----------
 en-us/blog/YouZan-case-study.json |   2 +-
 2 files changed, 229 insertions(+), 94 deletions(-)

diff --git a/en-us/blog/YouZan-case-study.html 
b/en-us/blog/YouZan-case-study.html
index d820d33..0bdc298 100644
--- a/en-us/blog/YouZan-case-study.html
+++ b/en-us/blog/YouZan-case-study.html
@@ -12,157 +12,292 @@
 </head>
 <body>
   <div id="root"><div class="blog-detail-page" data-reactroot=""><header 
class="header-container header-container-dark"><div class="header-body"><span 
class="mobile-menu-btn mobile-menu-btn-dark"></span><a 
href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div 
class="search search-dark"><span class="icon-search"></span></div><span 
class="language-switch language-switch-dark">中</span><div 
class="header-menu"><div><ul class="ant-menu whiteClass ant-menu-light ant-m 
[...]
-<p><a href="https://imgpp.com/image/i2Fo0";><img 
src="https://imgpp.com/images/2021/12/16/1639383815755.md.png"; 
alt="1639383815755.md.png"></a></p>
-<p>At the recent Apache DolphinScheduler Meetup 2021, Zheqi Song, the Director 
of Youzan Big Data Development Platform shared the design scheme and production 
environment practice of its scheduling system migration from Airflow to Apache 
DolphinScheduler.</p>
-<p>This post-90s young man from Hangzhou, Zhejiang Province joined Youzan in 
September 2019, where he is engaged in the research and development of data 
development platforms, scheduling systems, and data synchronization modules. 
When he first joined, Youzan used Airflow, which is also an Apache open source 
project, but after research and production environment testing, Youzan decided 
to switch to DolphinScheduler.</p>
-<p>How does the Youzan big data development platform use the scheduling 
system? Why did Youzan decide to switch to Apache DolphinScheduler? The message 
below will uncover the truth.</p>
+<div align=center>
+<img src="https://imgpp.com/images/2021/12/16/1639383815755.md.png"/>
+</div>
+<p>At the recent Apache DolphinScheduler Meetup 2021, Zheqi Song, the Director 
of Youzan Big Data Development Platform
+shared the design scheme and production environment practice of its scheduling 
system migration from Airflow to Apache
+DolphinScheduler.</p>
+<p>This post-90s young man from Hangzhou, Zhejiang Province joined Youzan in 
September 2019, where he is engaged in the
+research and development of data development platforms, scheduling systems, 
and data synchronization modules. When he
+first joined, Youzan used Airflow, which is also an Apache open source 
project, but after research and production
+environment testing, Youzan decided to switch to DolphinScheduler.</p>
+<p>How does the Youzan big data development platform use the scheduling 
system? Why did Youzan decide to switch to Apache
+DolphinScheduler? The message below will uncover the truth.</p>
 <h2>Youzan Big Data Development Platform（DP）</h2>
-<p>As a retail technology SaaS service provider, Youzan is aimed to help 
online merchants open stores, build data products and digital solutions through 
social marketing and expand the omnichannel retail business, and provide better 
SaaS capabilities for driving merchants' digital growth.</p>
+<p>As a retail technology SaaS service provider, Youzan is aimed to help 
online merchants open stores, build data products
+and digital solutions through social marketing and expand the omnichannel 
retail business, and provide better SaaS
+capabilities for driving merchants' digital growth.</p>
 <p>At present, Youzan has established a relatively complete digital product 
matrix with the support of the data center:</p>
-<p><a href="https://imgpp.com/image/i2gJb";><img 
src="https://imgpp.com/images/2021/12/16/1_Jjgx5qQfjo559_oaJP-DAQ.md.png"; 
alt="1_Jjgx5qQfjo559_oaJP-DAQ.md.png"></a></p>
-<p>Youzan has established a big data development platform (hereinafter 
referred to as DP platform) to support the increasing demand for data 
processing services. This is a big data offline development platform that 
provides users with the environment, tools, and data needed for the big data 
tasks development.</p>
-<p><a href="https://imgpp.com/image/i2jiJ";><img 
src="https://imgpp.com/images/2021/12/16/1_G9znZGQ1XBhJva0tjWa6Bg.md.png"; 
alt="1_G9znZGQ1XBhJva0tjWa6Bg.md.png"></a></p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_Jjgx5qQfjo559_oaJP-DAQ.md.png"/>
+</div>
+<p>Youzan has established a big data development platform (hereinafter 
referred to as DP platform) to support the
+increasing demand for data processing services. This is a big data offline 
development platform that provides users with
+the environment, tools, and data needed for the big data tasks development.</p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_G9znZGQ1XBhJva0tjWa6Bg.md.png"/>
+</div>
 <p>Youzan Big Data Development Platform Architecture</p>
-<p>Youzan Big Data Development Platform is mainly composed of five modules: 
basic component layer, task component layer, scheduling layer, service layer, 
and monitoring layer. Among them, the service layer is mainly responsible for 
the job life cycle management, and the basic component layer and the task 
component layer mainly include the basic environment such as middleware and big 
data components that the big data development platform depends on. The service 
deployment of the DP platfo [...]
+<p>Youzan Big Data Development Platform is mainly composed of five modules: 
basic component layer, task component layer,
+scheduling layer, service layer, and monitoring layer. Among them, the service 
layer is mainly responsible for the job
+life cycle management, and the basic component layer and the task component 
layer mainly include the basic environment
+such as middleware and big data components that the big data development 
platform depends on. The service deployment of
+the DP platform mainly adopts the master-slave mode, and the master node 
supports HA. The scheduling layer is
+re-developed based on Airflow, and the monitoring layer performs comprehensive 
monitoring and early warning of the
+scheduling cluster.</p>
 <h3>1 Scheduling layer architecture design</h3>
-<p><a href="https://imgpp.com/image/i2nK7";><img 
src="https://imgpp.com/images/2021/12/16/1_UDNCmMrZtcswj62aqNXA1g.md.png"; 
alt="1_UDNCmMrZtcswj62aqNXA1g.md.png"></a></p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_UDNCmMrZtcswj62aqNXA1g.md.png"/>
+</div>
 <p>Youzan Big Data Development Platform Scheduling Layer Architecture 
Design</p>
-<p>In 2017, our team investigated the mainstream scheduling systems, and 
finally adopted Airflow (1.7) as the task scheduling module of DP. In the 
design of architecture, we adopted the deployment plan of Airflow + Celery + 
Redis + MySQL based on actual business scenario demand, with Redis as the 
dispatch queue, and implemented distributed deployment of any number of workers 
through Celery.</p>
-<p>In the HA design of the scheduling node, it is well known that Airflow has 
a single point problem on the scheduled node. To achieve high availability of 
scheduling, the DP platform uses the Airflow Scheduler Failover Controller, an 
open-source component, and adds a Standby node that will periodically monitor 
the health of the Active node. Once the Active node is found to be unavailable, 
Standby is switched to Active to ensure the high availability of the 
schedule.</p>
+<p>In 2017, our team investigated the mainstream scheduling systems, and 
finally adopted Airflow (1.7) as the task
+scheduling module of DP. In the design of architecture, we adopted the 
deployment plan of Airflow + Celery + Redis +
+MySQL based on actual business scenario demand, with Redis as the dispatch 
queue, and implemented distributed deployment
+of any number of workers through Celery.</p>
+<p>In the HA design of the scheduling node, it is well known that Airflow has 
a single point problem on the scheduled node.
+To achieve high availability of scheduling, the DP platform uses the Airflow 
Scheduler Failover Controller, an
+open-source component, and adds a Standby node that will periodically monitor 
the health of the Active node. Once the
+Active node is found to be unavailable, Standby is switched to Active to 
ensure the high availability of the schedule.</p>
 <h3>2 Worker nodes load balancing strategy</h3>
-<p>In addition, to use resources more effectively, the DP platform 
distinguishes task types based on CPU-intensive degree/memory-intensive degree 
and configures different slots for different celery queues to ensure that each 
machine's CPU/memory usage rate is maintained within a reasonable range.</p>
+<p>In addition, to use resources more effectively, the DP platform 
distinguishes task types based on CPU-intensive
+degree/memory-intensive degree and configures different slots for different 
celery queues to ensure that each machine's
+CPU/memory usage rate is maintained within a reasonable range.</p>
 <h2>Scheduling System Upgrade and Selection</h2>
-<p>Since the official launch of the Youzan Big Data Platform 1.0 in 2017, we 
have completed 100% of the data warehouse migration plan in 2018. In 2019, the 
daily scheduling task volume has reached 30,000+ and has grown to 60,000+ by 
2021. the platform’s daily scheduling task volume will be reached. With the 
rapid increase in the number of tasks, DP's scheduling system also faces many 
challenges and problems.</p>
+<p>Since the official launch of the Youzan Big Data Platform 1.0 in 2017, we 
have completed 100% of the data warehouse
+migration plan in 2018. In 2019, the daily scheduling task volume has reached 
30,000+ and has grown to 60,000+ by 2021.
+the platform’s daily scheduling task volume will be reached. With the rapid 
increase in the number of tasks, DP's
+scheduling system also faces many challenges and problems.</p>
 <h3>1 Pain points of Airflow</h3>
 <ol>
-<li>In-depth re-development is difficult, the commercial version is separated 
from the community, and costs relatively high to upgrade ;</li>
+<li>In-depth re-development is difficult, the commercial version is separated 
from the community, and costs relatively
+high to upgrade ;</li>
 <li>Based on the Python technology stack, the maintenance and iteration cost 
higher;</li>
 <li>Performance issues:</li>
 </ol>
-<p><a href="https://imgpp.com/image/iR2cZ";><img 
src="https://imgpp.com/images/2021/12/16/1_U33OWzzfw2Dqn3ryCNbSvw.md.png"; 
alt="1_U33OWzzfw2Dqn3ryCNbSvw.md.png"></a></p>
-<p>Airflow's schedule loop, as shown in the figure above, is essentially the 
loading and analysis of DAG and generates DAG round instances to perform task 
scheduling. Before Airflow 2.0, the DAG was scanned and parsed into the 
database by a single point. It leads to a large delay (over the scanning 
frequency, even to 60s-70s) for the scheduler loop to scan the Dag folder once 
the number of Dags was largely due to business growth. This seriously reduces 
the scheduling performance.</p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_U33OWzzfw2Dqn3ryCNbSvw.md.png"/>
+</div>
+<p>Airflow's schedule loop, as shown in the figure above, is essentially the 
loading and analysis of DAG and generates DAG
+round instances to perform task scheduling. Before Airflow 2.0, the DAG was 
scanned and parsed into the database by a
+single point. It leads to a large delay (over the scanning frequency, even to 
60s-70s) for the scheduler loop to scan
+the Dag folder once the number of Dags was largely due to business growth. 
This seriously reduces the scheduling
+performance.</p>
 <ol start="4">
 <li>Stability issues:</li>
 </ol>
-<p>The Airflow Scheduler Failover Controller is essentially run by a 
master-slave mode. The standby node judges whether to switch by monitoring 
whether the active process is alive or not. If it encounters a deadlock 
blocking the process before, it will be ignored, which will lead to scheduling 
failure. After similar problems occurred in the production environment, we 
found the problem after troubleshooting. Although Airflow version 1.10 has 
fixed this problem, this problem will exist in  [...]
+<p>The Airflow Scheduler Failover Controller is essentially run by a 
master-slave mode. The standby node judges whether to
+switch by monitoring whether the active process is alive or not. If it 
encounters a deadlock blocking the process
+before, it will be ignored, which will lead to scheduling failure. After 
similar problems occurred in the production
+environment, we found the problem after troubleshooting. Although Airflow 
version 1.10 has fixed this problem, this
+problem will exist in the master-slave mode, and cannot be ignored in the 
production environment.</p>
 <p>Taking into account the above pain points, we decided to re-select the 
scheduling system for the DP platform.</p>
-<p>In the process of research and comparison, Apache DolphinScheduler entered 
our field of vision. Also to be Apache's top open-source scheduling component 
project, we have made a comprehensive comparison between the original 
scheduling system and DolphinScheduler from the perspectives of performance, 
deployment, functionality, stability, and availability, and community 
ecology.</p>
+<p>In the process of research and comparison, Apache DolphinScheduler entered 
our field of vision. Also to be Apache's top
+open-source scheduling component project, we have made a comprehensive 
comparison between the original scheduling system
+and DolphinScheduler from the perspectives of performance, deployment, 
functionality, stability, and availability, and
+community ecology.</p>
 <p>This is the comparative analysis result below:</p>
-<p><a href="https://imgpp.com/image/iRJWj";><img 
src="https://imgpp.com/images/2021/12/16/1_Rbr05klPmQIc7WPFNeEH-w.md.png"; 
alt="1_Rbr05klPmQIc7WPFNeEH-w.md.png"></a></p>
-<p><a href="https://imgpp.com/image/iRPvA";><img 
src="https://imgpp.com/images/2021/12/16/1_Ity1QoRL_Yu5aDVClY9AgA.md.png"; 
alt="1_Ity1QoRL_Yu5aDVClY9AgA.md.png"></a>
-Airflow VS DolphinScheduler</p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_Rbr05klPmQIc7WPFNeEH-w.md.png"/>
+</div>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_Ity1QoRL_Yu5aDVClY9AgA.md.png"/>
+</div>
+<p>Airflow VS DolphinScheduler</p>
 <h3>1 DolphinScheduler valuation</h3>
-<p><a href="https://imgpp.com/image/iRUHk";><img 
src="https://imgpp.com/images/2021/12/16/1_o8c1Y1TFAOis3KozzJnvfA.md.png"; 
alt="1_o8c1Y1TFAOis3KozzJnvfA.md.png"></a></p>
-<p>As shown in the figure above, after evaluating, we found that the 
throughput performance of DolphinScheduler is twice that of the original 
scheduling system under the same conditions. And we have heard that the 
performance of DolphinScheduler will greatly be improved after version 2.0, 
this news greatly excites us.</p>
-<p>In addition, at the deployment level, the Java technology stack adopted by 
DolphinScheduler is conducive to the standardized deployment process of ops, 
simplifies the release process, liberates operation and maintenance manpower, 
and supports Kubernetes and Docker deployment with stronger scalability.</p>
-<p>In terms of new features, DolphinScheduler has a more flexible 
task-dependent configuration, to which we attach much importance, and the 
granularity of time configuration is refined to the hour, day, week, and month. 
In addition, DolphinScheduler's scheduling management interface is easier to 
use and supports worker group isolation. As a distributed scheduling, the 
overall scheduling capability of DolphinScheduler grows linearly with the scale 
of the cluster, and with the release of n [...]
-<p>From the perspective of stability and availability, DolphinScheduler 
achieves high reliability and high scalability, the decentralized multi-Master 
multi-Worker design architecture supports dynamic online and offline services 
and has stronger self-fault tolerance and adjustment capabilities.</p>
-<p></p>
-<p>And also importantly, after months of communication, we found that the 
DolphinScheduler community is highly active, with frequent technical exchanges, 
detailed technical documents outputs, and fast version iteration.</p>
-<p></p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_o8c1Y1TFAOis3KozzJnvfA.md.png"/>
+</div>
+<p>As shown in the figure above, after evaluating, we found that the 
throughput performance of DolphinScheduler is twice
+that of the original scheduling system under the same conditions. And we have 
heard that the performance of
+DolphinScheduler will greatly be improved after version 2.0, this news greatly 
excites us.</p>
+<p>In addition, at the deployment level, the Java technology stack adopted by 
DolphinScheduler is conducive to the
+standardized deployment process of ops, simplifies the release process, 
liberates operation and maintenance manpower,
+and supports Kubernetes and Docker deployment with stronger scalability.</p>
+<p>In terms of new features, DolphinScheduler has a more flexible 
task-dependent configuration, to which we attach much
+importance, and the granularity of time configuration is refined to the hour, 
day, week, and month. In addition,
+DolphinScheduler's scheduling management interface is easier to use and 
supports worker group isolation. As a
+distributed scheduling, the overall scheduling capability of DolphinScheduler 
grows linearly with the scale of the
+cluster, and with the release of new feature task plug-ins, the task-type 
customization is also going to be attractive
+character.</p>
+<p>From the perspective of stability and availability, DolphinScheduler 
achieves high reliability and high scalability, the
+decentralized multi-Master multi-Worker design architecture supports dynamic 
online and offline services and has
+stronger self-fault tolerance and adjustment capabilities.</p>
+<p>And also importantly, after months of communication, we found that the 
DolphinScheduler community is highly active, with
+frequent technical exchanges, detailed technical documents outputs, and fast 
version iteration.</p>
 <p>In summary, we decided to switch to DolphinScheduler.</p>
 <h2>DolphinScheduler Migration Scheme Design</h2>
-<p>After deciding to migrate to DolphinScheduler, we sorted out the platform's 
requirements for the transformation of the new scheduling system.</p>
-<p></p>
+<p>After deciding to migrate to DolphinScheduler, we sorted out the platform's 
requirements for the transformation of the
+new scheduling system.</p>
 <p>In conclusion, the key requirements are as below:</p>
-<p></p>
 <ol>
-<li>Users are not aware of migration. There are 700-800 users on the platform, 
we hope that the user switching cost can be reduced;</li>
-<li>The scheduling system can be dynamically switched because the production 
environment requires stability above all else. The online grayscale test will 
be performed during the online period, we hope that the scheduling system can 
be dynamically switched based on the granularity of the workflow;</li>
-<li>The workflow configuration for testing and publishing needs to be 
isolated. Currently, we have two sets of configuration files for task testing 
and publishing that are maintained through GitHub. Online scheduling task 
configuration needs to ensure the accuracy and stability of the data, so two 
sets of environments are required for isolation.</li>
+<li>Users are not aware of migration. There are 700-800 users on the platform, 
we hope that the user switching cost can
+be reduced;</li>
+<li>The scheduling system can be dynamically switched because the production 
environment requires stability above all
+else. The online grayscale test will be performed during the online period, we 
hope that the scheduling system can be
+dynamically switched based on the granularity of the workflow;</li>
+<li>The workflow configuration for testing and publishing needs to be 
isolated. Currently, we have two sets of
+configuration files for task testing and publishing that are maintained 
through GitHub. Online scheduling task
+configuration needs to ensure the accuracy and stability of the data, so two 
sets of environments are required for
+isolation.</li>
 </ol>
 <p>In response to the above three points, we have redesigned the 
architecture.</p>
-<p></p>
 <h3>1 Architecture design</h3>
 <ol>
 <li>Keep the existing front-end interface and DP API;</li>
-<li>Refactoring the scheduling management interface, which was originally 
embedded in the Airflow interface, and will be rebuilt based on 
DolphinScheduler in the future;</li>
+<li>Refactoring the scheduling management interface, which was originally 
embedded in the Airflow interface, and will be
+rebuilt based on DolphinScheduler in the future;</li>
 <li>Task lifecycle management/scheduling management and other operations 
interact through the DolphinScheduler API;</li>
-<li>Use the Project mechanism to redundantly configure the workflow to achieve 
configuration isolation for testing and release.</li>
+<li>Use the Project mechanism to redundantly configure the workflow to achieve 
configuration isolation for testing and
+release.</li>
 </ol>
-<p><a href="https://imgpp.com/image/iRdIC";><img 
src="https://imgpp.com/images/2021/12/16/1_eusVhW4QAJ2uO-J96bqiFg.md.png"; 
alt="1_eusVhW4QAJ2uO-J96bqiFg.md.png"></a>
-Refactoring Design</p>
-<p></p>
-<p>We entered the transformation phase after the architecture design is 
completed. We have transformed DolphinScheduler's workflow definition, task 
execution process, and workflow release process, and have made some key 
functions to complement it.</p>
-<p></p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_eusVhW4QAJ2uO-J96bqiFg.md.png"/>
+</div>
+<p>Refactoring Design</p>
+<p>We entered the transformation phase after the architecture design is 
completed. We have transformed DolphinScheduler's
+workflow definition, task execution process, and workflow release process, and 
have made some key functions to
+complement it.</p>
 <ul>
 <li>Workflow definition status combing</li>
 </ul>
-<p><a href="https://imgpp.com/image/iRhM6";><img 
src="https://imgpp.com/images/2021/12/16/-1.md.png"; alt="-1.md.png"></a></p>
-<p>We first combed the definition status of the DolphinScheduler workflow. The 
definition and timing management of DolphinScheduler work will be divided into 
online and offline status, while the status of the two on the DP platform is 
unified, so in the task test and workflow release process, the process series 
from DP to DolphinScheduler needs to be modified accordingly.</p>
+<div align=center>
+<img src="https://imgpp.com/images/2021/12/16/-1.md.png"/>
+</div>
+<p>We first combed the definition status of the DolphinScheduler workflow. The 
definition and timing management of
+DolphinScheduler work will be divided into online and offline status, while 
the status of the two on the DP platform is
+unified, so in the task test and workflow release process, the process series 
from DP to DolphinScheduler needs to be
+modified accordingly.</p>
 <ul>
 <li>Task execution process transformation</li>
 </ul>
-<p>Firstly, we have changed the task test process. After switching to 
DolphinScheduler, all interactions are based on the DolphinScheduler API. When 
the task test is started on DP, the corresponding workflow definition 
configuration will be generated on the DolphinScheduler. After going online, 
the task will be run and the DolphinScheduler log will be called to view the 
results and obtain log running information in real-time.</p>
-<p><a href="https://imgpp.com/image/iRhM6";><img 
src="https://imgpp.com/images/2021/12/16/-1.md.png"; alt="-1.md.png"></a></p>
-<p><a href="https://imgpp.com/image/iRtJH";><img 
src="https://imgpp.com/images/2021/12/16/-3.md.png"; alt="-3.md.png"></a></p>
-<ul>
-<li>Workflow release process transformation</li>
-</ul>
-<p>Secondly, for the workflow online process, after switching to 
DolphinScheduler, the main change is to synchronize the workflow definition 
configuration and timing configuration, as well as the online status.</p>
-<p><a href="https://imgpp.com/image/iRBNI";><img 
src="https://imgpp.com/images/2021/12/16/1_4-ikFp_jJ44-YWJcGNioOg.md.png"; 
alt="1_4-ikFp_jJ44-YWJcGNioOg.md.png"></a></p>
-<p><a href="https://imgpp.com/image/iRwim";><img 
src="https://imgpp.com/images/2021/12/16/-5.md.png"; alt="-5.md.png"></a></p>
-<p>The original data maintenance and configuration synchronization of the 
workflow is managed based on the DP master, and only when the task is online 
and running will it interact with the scheduling system. Based on these two 
core changes, the DP platform can dynamically switch systems under the 
workflow, and greatly facilitate the subsequent online grayscale test.</p>
+<p>Firstly, we have changed the task test process. After switching to 
DolphinScheduler, all interactions are based on the
+DolphinScheduler API. When the task test is started on DP, the corresponding 
workflow definition configuration will be
+generated on the DolphinScheduler. After going online, the task will be run 
and the DolphinScheduler log will be called
+to view the results and obtain log running information in real-time.</p>
+<div align=center>
+<img src="https://imgpp.com/images/2021/12/16/-1.md.png"/>
+</div>
+<div align=center>
+<img src="https://imgpp.com/images/2021/12/16/-3.md.png"/>
+</div>
+- Workflow release process transformation
+<p>Secondly, for the workflow online process, after switching to 
DolphinScheduler, the main change is to synchronize the
+workflow definition configuration and timing configuration, as well as the 
online status.</p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_4-ikFp_jJ44-YWJcGNioOg.md.png"/>
+</div>
+<div align=center>
+<img src="https://imgpp.com/images/2021/12/16/-5.md.png"/>
+</div>
+The original data maintenance and configuration synchronization of the 
workflow is managed based on the DP master, and
+only when the task is online and running will it interact with the scheduling 
system. Based on these two core changes,
+the DP platform can dynamically switch systems under the workflow, and greatly 
facilitate the subsequent online
+grayscale test.
 <h3>2 Function completion</h3>
-<p></p>
 <p>In addition, the DP platform has also complemented some functions. The 
first is the adaptation of task types.</p>
-<p></p>
 <ul>
 <li>Task type adaptation</li>
 </ul>
-<p>Currently, the task types supported by the DolphinScheduler platform mainly 
include data synchronization and data calculation tasks, such as Hive SQL 
tasks, DataX tasks, and Spark tasks. Because the original data information of 
the task is maintained on the DP, the docking scheme of the DP platform is to 
build a task configuration mapping module in the DP master, map the task 
information maintained by the DP to the task on DP, and then use the API call 
of DolphinScheduler to transfer  [...]
-<p><a href="https://imgpp.com/image/iROc4";><img 
src="https://imgpp.com/images/2021/12/16/1_A76iOa5LKyPiu-NoopmYrA.md.png"; 
alt="1_A76iOa5LKyPiu-NoopmYrA.md.png"></a></p>
-<p>Because some of the task types are already supported by DolphinScheduler, 
it is only necessary to customize the corresponding task modules of 
DolphinScheduler to meet the actual usage scenario needs of the DP platform. 
For the task types not supported by DolphinScheduler, such as Kylin tasks, 
algorithm training tasks, DataY tasks, etc., the DP platform also plans to 
complete it with the plug-in capabilities of DolphinScheduler 2.0.</p>
+<p>Currently, the task types supported by the DolphinScheduler platform mainly 
include data synchronization and data
+calculation tasks, such as Hive SQL tasks, DataX tasks, and Spark tasks. 
Because the original data information of the
+task is maintained on the DP, the docking scheme of the DP platform is to 
build a task configuration mapping module in
+the DP master, map the task information maintained by the DP to the task on 
DP, and then use the API call of
+DolphinScheduler to transfer task configuration information.</p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_A76iOa5LKyPiu-NoopmYrA.md.png"/>
+</div>
+<p>Because some of the task types are already supported by DolphinScheduler, 
it is only necessary to customize the
+corresponding task modules of DolphinScheduler to meet the actual usage 
scenario needs of the DP platform. For the task
+types not supported by DolphinScheduler, such as Kylin tasks, algorithm 
training tasks, DataY tasks, etc., the DP
+platform also plans to complete it with the plug-in capabilities of 
DolphinScheduler 2.0.</p>
 <h3>3 Transformation schedule</h3>
-<p>Because SQL tasks and synchronization tasks on the DP platform account for 
about 80% of the total tasks, the transformation focuses on these task types. 
At present, the adaptation and transformation of Hive SQL tasks, DataX tasks, 
and script tasks adaptation have been completed.</p>
-<p><a href="https://imgpp.com/image/iRYY8";><img 
src="https://imgpp.com/images/2021/12/16/1_y7HUfYyLs9NxnTzENKGSCA.md.png"; 
alt="1_y7HUfYyLs9NxnTzENKGSCA.md.png"></a></p>
-<h3>4 Function complement</h3>
+<p>Because SQL tasks and synchronization tasks on the DP platform account for 
about 80% of the total tasks, the
+transformation focuses on these task types. At present, the adaptation and 
transformation of Hive SQL tasks, DataX
+tasks, and script tasks adaptation have been completed.</p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_y7HUfYyLs9NxnTzENKGSCA.md.png"/>
+</div>
+### 4 Function complement
 <ul>
 <li>Catchup mechanism realizes automatic replenishment</li>
 </ul>
-<p>DP also needs a core capability in the actual production environment, that 
is, Catchup-based automatic replenishment and global replenishment 
capabilities.</p>
-<p>The catchup mechanism will play a role when the scheduling system is 
abnormal or resources is insufficient, causing some tasks to miss the currently 
scheduled trigger time. When the scheduling is resumed, Catchup will 
automatically fill in the untriggered scheduling execution plan.</p>
+<p>DP also needs a core capability in the actual production environment, that 
is, Catchup-based automatic replenishment and
+global replenishment capabilities.</p>
+<p>The catchup mechanism will play a role when the scheduling system is 
abnormal or resources is insufficient, causing some
+tasks to miss the currently scheduled trigger time. When the scheduling is 
resumed, Catchup will automatically fill in
+the untriggered scheduling execution plan.</p>
 <p>The following three pictures show the instance of an hour-level workflow 
scheduling execution.</p>
-<p>In Figure 1, the workflow is called up on time at 6 o'clock and tuned up 
once an hour. You can see that the task is called up on time at 6 o'clock and 
the task execution is completed. The current state is also normal.</p>
-<p><a href="https://imgpp.com/image/iRk6U";><img 
src="https://imgpp.com/images/2021/12/16/1_MvQGZ-FKKLMvKrlWihXHgg.md.png"; 
alt="1_MvQGZ-FKKLMvKrlWihXHgg.md.png"></a></p>
+<p>In Figure 1, the workflow is called up on time at 6 o'clock and tuned up 
once an hour. You can see that the task is
+called up on time at 6 o'clock and the task execution is completed. The 
current state is also normal.</p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_MvQGZ-FKKLMvKrlWihXHgg.md.png"/>
+</div>
 <p>figure 1</p>
-<p>Figure 2 shows that the scheduling system was abnormal at 8 o'clock, 
causing the workflow not to be activated at 7 o'clock and 8 o'clock.</p>
-<p><a href="https://imgpp.com/image/iRGHe";><img 
src="https://imgpp.com/images/2021/12/16/1_1WxLOtd1Oh2YERmtGcRb0Q.md.png"; 
alt="1_1WxLOtd1Oh2YERmtGcRb0Q.md.png"></a>
- figure 2</p>
-<p></p>
-<p>Figure 3 shows that when the scheduling is resumed at 9 o'clock, thanks to 
the Catchup mechanism, the scheduling system can automatically replenish the 
previously lost execution plan to realize the automatic replenishment of the 
scheduling.</p>
-<p><a href="https://imgpp.com/image/iRSXD";><img 
src="https://imgpp.com/images/2021/12/16/126ec1039f7aa614c.md.png"; 
alt="126ec1039f7aa614c.md.png"></a>
-Figure 3</p>
-<p></p>
-<p>This mechanism is particularly effective when the amount of tasks is large. 
When the scheduled node is abnormal or the core task accumulation causes the 
workflow to miss the scheduled trigger time, due to the system's fault-tolerant 
mechanism can support automatic replenishment of scheduled tasks, there is no 
need to replenish and re-run manually.</p>
+<p>Figure 2 shows that the scheduling system was abnormal at 8 o'clock, 
causing the workflow not to be activated at 7
+o'clock and 8 o'clock.</p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_1WxLOtd1Oh2YERmtGcRb0Q.md.png"/>
+</div>
+figure 2
+<p>Figure 3 shows that when the scheduling is resumed at 9 o'clock, thanks to 
the Catchup mechanism, the scheduling system
+can automatically replenish the previously lost execution plan to realize the 
automatic replenishment of the scheduling.</p>
+<div align=center>
+<img src="https://imgpp.com/images/2021/12/16/126ec1039f7aa614c.md.png"/>
+</div>
+<p>Figure 3</p>
+<p>This mechanism is particularly effective when the amount of tasks is large. 
When the scheduled node is abnormal or the
+core task accumulation causes the workflow to miss the scheduled trigger time, 
due to the system's fault-tolerant
+mechanism can support automatic replenishment of scheduled tasks, there is no 
need to replenish and re-run manually.</p>
 <p>At the same time, this mechanism is also applied to DP's global 
complement.</p>
 <ul>
 <li>Global Complement across Dags</li>
 </ul>
-<p><a href="https://imgpp.com/image/iRZa2";><img 
src="https://imgpp.com/images/2021/12/16/1_eVyyABTQCLeSGzbbuizfDA.md.png"; 
alt="1_eVyyABTQCLeSGzbbuizfDA.md.png"></a></p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_eVyyABTQCLeSGzbbuizfDA.md.png"/>
+</div>
 <p>DP platform cross-Dag global complement process</p>
-<p>The main use scenario of global complements in Youzan is when there is an 
abnormality in the output of the core upstream table, which results in abnormal 
data display in downstream businesses. In this case, the system generally needs 
to quickly rerun all task instances under the entire data link.</p>
-<p>Based on the function of Clear, the DP platform is currently able to obtain 
certain nodes and all downstream instances under the current scheduling cycle 
through analysis of the original data, and then to filter some instances that 
do not need to be rerun through the rule pruning strategy. After obtaining 
these lists, start the clear downstream clear task instance function, and then 
use Catchup to automatically fill up.</p>
+<p>The main use scenario of global complements in Youzan is when there is an 
abnormality in the output of the core upstream
+table, which results in abnormal data display in downstream businesses. In 
this case, the system generally needs to
+quickly rerun all task instances under the entire data link.</p>
+<p>Based on the function of Clear, the DP platform is currently able to obtain 
certain nodes and all downstream instances
+under the current scheduling cycle through analysis of the original data, and 
then to filter some instances that do not
+need to be rerun through the rule pruning strategy. After obtaining these 
lists, start the clear downstream clear task
+instance function, and then use Catchup to automatically fill up.</p>
 <p>This process realizes the global rerun of the upstream core through Clear, 
which can liberate manual operations.</p>
-<p>Because the cross-Dag global complement capability is important in a 
production environment, we plan to complement it in DolphinScheduler.</p>
+<p>Because the cross-Dag global complement capability is important in a 
production environment, we plan to complement it in
+DolphinScheduler.</p>
 <h2>Current Status &amp; Planning &amp; Outlook</h2>
 <h3>1 DolphinScheduler migration status</h3>
-<p></p>
-<p>The DP platform has deployed part of the DolphinScheduler service in the 
test environment and migrated part of the workflow.</p>
-<p>After docking with the DolphinScheduler API system, the DP platform 
uniformly uses the admin user at the user level. Because its user system is 
directly maintained on the DP master, all workflow information will be divided 
into the test environment and the formal environment.</p>
-<p></p>
-<p><a href="https://imgpp.com/image/iRi0N";><img 
src="https://imgpp.com/images/2021/12/16/1_bXwtKI2HJzQuHCMW5y3hgg.md.png"; 
alt="1_bXwtKI2HJzQuHCMW5y3hgg.md.png"></a></p>
+<p>The DP platform has deployed part of the DolphinScheduler service in the 
test environment and migrated part of the
+workflow.</p>
+<p>After docking with the DolphinScheduler API system, the DP platform 
uniformly uses the admin user at the user level.
+Because its user system is directly maintained on the DP master, all workflow 
information will be divided into the test
+environment and the formal environment.</p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_bXwtKI2HJzQuHCMW5y3hgg.md.png"/>
+</div>
 <p>DolphinScheduler 2.0 workflow task node display</p>
-<p>The overall UI interaction of DolphinScheduler 2.0 looks more concise and 
more visualized and we plan to directly upgrade to version 2.0.</p>
+<p>The overall UI interaction of DolphinScheduler 2.0 looks more concise and 
more visualized and we plan to directly
+upgrade to version 2.0.</p>
 <h3>2 Access planning</h3>
-<p></p>
-<p>At present, the DP platform is still in the grayscale test of 
DolphinScheduler migration., and is planned to perform a full migration of the 
workflow in December this year. At the same time, a phased full-scale test of 
performance and stress will be carried out in the test environment. If no 
problems occur, we will conduct a grayscale test of the production environment 
in January 2022, and plan to complete the full migration in March.</p>
-<p><a href="https://imgpp.com/image/iR9PL";><img 
src="https://imgpp.com/images/2021/12/16/1_jv3ScivmLop7GYjKIECaiw.md.png"; 
alt="1_jv3ScivmLop7GYjKIECaiw.md.png"></a></p>
+<p>At present, the DP platform is still in the grayscale test of 
DolphinScheduler migration., and is planned to perform a
+full migration of the workflow in December this year. At the same time, a 
phased full-scale test of performance and
+stress will be carried out in the test environment. If no problems occur, we 
will conduct a grayscale test of the
+production environment in January 2022, and plan to complete the full 
migration in March.</p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_jv3ScivmLop7GYjKIECaiw.md.png"/>
+</div>
 <h3>3 Expectations for DolphinScheduler</h3>
-<p>In the future, we strongly looking forward to the plug-in tasks feature in 
DolphinScheduler, and have implemented plug-in alarm components based on 
DolphinScheduler 2.0, by which the Form information can be defined on the 
backend and displayed adaptively on the frontend.</p>
-<p><a href="https://imgpp.com/image/iRbic";><img 
src="https://imgpp.com/images/2021/12/16/1_3jP2KQDtFy71ciDoUyW3eg.md.png"; 
alt="1_3jP2KQDtFy71ciDoUyW3eg.md.png"></a></p>
+<p>In the future, we strongly looking forward to the plug-in tasks feature in 
DolphinScheduler, and have implemented
+plug-in alarm components based on DolphinScheduler 2.0, by which the Form 
information can be defined on the backend and
+displayed adaptively on the frontend.</p>
+<div align=center>
+<img 
src="https://imgpp.com/images/2021/12/16/1_3jP2KQDtFy71ciDoUyW3eg.md.png"/>
+</div>
 <p>&quot;</p>
-<p>I hope that DolphinScheduler's optimization pace of plug-in feature can be 
faster, to better quickly adapt to our customized task types.</p>
+<p>I hope that DolphinScheduler's optimization pace of plug-in feature can be 
faster, to better quickly adapt to our
+customized task types.</p>
 <p>——Zheqi Song, Head of Youzan Big Data Development Platform</p>
 <p>&quot;</p>
 </section><footer class="footer-container"><div 
class="footer-body"><div><h3>About us</h3><h4>Do you need feedback? Please 
contact us through the following ways.</h4></div><div 
class="contact-container"><ul><li><a 
href="/en-us/community/development/subscribe.html"><img class="img-base" 
src="/img/emailgray.png"/><img class="img-change" 
src="/img/emailblue.png"/><p>Email List</p></a></li><li><a 
href="https://twitter.com/dolphinschedule";><img class="img-base" 
src="/img/twittergray.png"/><im [...]
diff --git a/en-us/blog/YouZan-case-study.json 
b/en-us/blog/YouZan-case-study.json
index 279c675..7122b15 100644
--- a/en-us/blog/YouZan-case-study.json
+++ b/en-us/blog/YouZan-case-study.json
@@ -1,6 +1,6 @@
 {
   "filename": "YouZan-case-study.md",
-  "__html": "<h1>From Airflow to Apache DolphinScheduler, the Roadmap of 
Scheduling System On Youzan Big Data Development Platform</h1>\n<p><a 
href=\"https://imgpp.com/image/i2Fo0\";><img 
src=\"https://imgpp.com/images/2021/12/16/1639383815755.md.png\"; 
alt=\"1639383815755.md.png\"></a></p>\n<p>At the recent Apache DolphinScheduler 
Meetup 2021, Zheqi Song, the Director of Youzan Big Data Development Platform 
shared the design scheme and production environment practice of its scheduling 
sys [...]
+  "__html": "<h1>From Airflow to Apache DolphinScheduler, the Roadmap of 
Scheduling System On Youzan Big Data Development Platform</h1>\n<div 
align=center>\n<img 
src=\"https://imgpp.com/images/2021/12/16/1639383815755.md.png\"/>\n</div>\n<p>At
 the recent Apache DolphinScheduler Meetup 2021, Zheqi Song, the Director of 
Youzan Big Data Development Platform\nshared the design scheme and production 
environment practice of its scheduling system migration from Airflow to 
Apache\nDolphinSchedul [...]
   "link": "/dist/en-us/blog/YouZan-case-study.html",
   "meta": {}
 }
\ No newline at end of file

[dolphinscheduler-website] branch asf-site updated: Automated deployment: ec04ffd7523f34c87a9a9b7b03730713f32efff1

Reply via email to