This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 6354dab  Automated deployment: d423ef4bf7b9a7a6f48cf5bab5d6a100d42122a1
6354dab is described below

commit 6354dab8d4dbf5072c8f4998444ba15ca6eec3e1
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Mon Dec 27 04:05:58 2021 +0000

    Automated deployment: d423ef4bf7b9a7a6f48cf5bab5d6a100d42122a1
---
 en-us/docs/dev/user_doc/architecture/design.html |  21 ++++++++++++---------
 en-us/docs/dev/user_doc/architecture/design.json |   2 +-
 img/failover-master.jpg                          | Bin 0 -> 172842 bytes
 img/failover-worker.jpg                          | Bin 0 -> 111717 bytes
 zh-cn/docs/dev/user_doc/architecture/design.html |  20 ++++++++++++--------
 zh-cn/docs/dev/user_doc/architecture/design.json |   2 +-
 6 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/en-us/docs/dev/user_doc/architecture/design.html 
b/en-us/docs/dev/user_doc/architecture/design.html
index 9a7586f..c4a2419 100644
--- a/en-us/docs/dev/user_doc/architecture/design.html
+++ b/en-us/docs/dev/user_doc/architecture/design.html
@@ -180,19 +180,23 @@ In the above figure, MainFlowThread waits for the end of 
SubFlowThread1, SubFlow
  </p>
 Among them, the Master monitors the directories of other Masters and Workers. 
If the remove event is heard, fault tolerance of the process instance or task 
instance will be performed according to the specific business logic.
 <ul>
-<li>Master fault tolerance flowchart:</li>
+<li>Master fault tolerance:</li>
 </ul>
- <p align="center">
-   <img 
src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant_master.png";
 alt="Master fault tolerance flowchart"  width="40%" />
+<p align="center">
+   <img src="/img/failover-master.jpg" alt="failover-master"  width="50%" />
  </p>
-After the fault tolerance of ZooKeeper Master is completed, it is re-scheduled 
by the Scheduler thread in DolphinScheduler, traverses the DAG to find the 
"running" and "submit successful" tasks, monitors the status of its task 
instances for the "running" tasks, and "commits successful" tasks It is 
necessary to determine whether the task queue already exists. If it exists, the 
status of the task instance is also monitored. If it does not exist, resubmit 
the task instance.
+<p>Fault tolerance range: From the perspective of host, the fault tolerance 
range of Master includes: own host + node host that does not exist in the 
registry, and the entire process of fault tolerance will be locked;</p>
+<p>Fault-tolerant content: Master's fault-tolerant content includes: 
fault-tolerant process instances and task instances. Before fault-tolerant, it 
compares the start time of the instance with the server start-up time, and 
skips fault-tolerance if after the server start time;</p>
+<p>Fault-tolerant post-processing: After the fault tolerance of ZooKeeper 
Master is completed, it is re-scheduled by the Scheduler thread in 
DolphinScheduler, traverses the DAG to find the &quot;running&quot; and 
&quot;submit successful&quot; tasks, monitors the status of its task instances 
for the &quot;running&quot; tasks, and &quot;commits successful&quot; tasks It 
is necessary to determine whether the task queue already exists. If it exists, 
the status of the task instance is also mo [...]
 <ul>
-<li>Worker fault tolerance flowchart:</li>
+<li>Worker fault tolerance:</li>
 </ul>
- <p align="center">
-   <img 
src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant_worker.png";
 alt="Worker fault tolerance flow chart"  width="40%" />
+<p align="center">
+   <img src="/img/failover-worker.jpg" alt="failover-worker"  width="50%" />
  </p>
-<p>Once the Master Scheduler thread finds that the task instance is in the 
&quot;fault-tolerant&quot; state, it takes over the task and resubmits it.</p>
+<p>Fault tolerance range: From the perspective of process instance, each 
Master is only responsible for fault tolerance of its own process instance; it 
will lock only when <code>handleDeadServer</code>;</p>
+<p>Fault-tolerant content: When sending the remove event of the Worker node, 
the Master only fault-tolerant task instances. Before fault-tolerant, it 
compares the start time of the instance with the server start-up time, and 
skips fault-tolerance if after the server start time;</p>
+<p>Fault-tolerant post-processing: Once the Master Scheduler thread finds that 
the task instance is in the &quot;fault-tolerant&quot; state, it takes over the 
task and resubmits it.</p>
 <p>Note: Due to &quot;network jitter&quot;, the node may lose its heartbeat 
with ZooKeeper in a short period of time, and the node's remove event may 
occur. For this situation, we use the simplest way, that is, once the node and 
ZooKeeper timeout connection occurs, then directly stop the Master or Worker 
service.</p>
 <h6>2.Task failed and try again</h6>
 <p>Here we must first distinguish the concepts of task failure retry, process 
failure recovery, and process failure rerun:</p>
@@ -325,7 +329,6 @@ public class TaskLogFilter extends 
Filter&lt;ILoggingEvent&gt; {
 ### Sum up
 From the perspective of scheduling, this article preliminarily introduces the 
architecture principles and implementation ideas of the big data distributed 
workflow scheduling system-DolphinScheduler. To be continued
 
-
 </code></pre>
 </div></section><footer class="footer-container"><div 
class="footer-body"><div><h3>About us</h3><h4>Do you need feedback? Please 
contact us through the following ways.</h4></div><div 
class="contact-container"><ul><li><a 
href="/en-us/community/development/subscribe.html"><img class="img-base" 
src="/img/emailgray.png"/><img class="img-change" 
src="/img/emailblue.png"/><p>Email List</p></a></li><li><a 
href="https://twitter.com/dolphinschedule";><img class="img-base" 
src="/img/twittergray.png [...]
   <script 
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-with-addons.min.js"></script>
diff --git a/en-us/docs/dev/user_doc/architecture/design.json 
b/en-us/docs/dev/user_doc/architecture/design.json
index 5e7b231..2d53d95 100644
--- a/en-us/docs/dev/user_doc/architecture/design.json
+++ b/en-us/docs/dev/user_doc/architecture/design.json
@@ -1,6 +1,6 @@
 {
   "filename": "design.md",
-  "__html": "<h2>System Architecture Design</h2>\n<p>Before explaining the 
architecture of the scheduling system, let's first understand the commonly used 
terms of the scheduling 
system</p>\n<h3>1.Glossary</h3>\n<p><strong>DAG:</strong> The full name is 
Directed Acyclic Graph, referred to as DAG. Task tasks in the workflow are 
assembled in the form of a directed acyclic graph, and topological traversal is 
performed from nodes with zero degrees of entry until there are no subsequent 
nodes [...]
+  "__html": "<h2>System Architecture Design</h2>\n<p>Before explaining the 
architecture of the scheduling system, let's first understand the commonly used 
terms of the scheduling 
system</p>\n<h3>1.Glossary</h3>\n<p><strong>DAG:</strong> The full name is 
Directed Acyclic Graph, referred to as DAG. Task tasks in the workflow are 
assembled in the form of a directed acyclic graph, and topological traversal is 
performed from nodes with zero degrees of entry until there are no subsequent 
nodes [...]
   "link": "/dist/en-us/docs/dev/user_doc/architecture/design.html",
   "meta": {}
 }
\ No newline at end of file
diff --git a/img/failover-master.jpg b/img/failover-master.jpg
new file mode 100644
index 0000000..5776781
Binary files /dev/null and b/img/failover-master.jpg differ
diff --git a/img/failover-worker.jpg b/img/failover-worker.jpg
new file mode 100644
index 0000000..71f5936
Binary files /dev/null and b/img/failover-worker.jpg differ
diff --git a/zh-cn/docs/dev/user_doc/architecture/design.html 
b/zh-cn/docs/dev/user_doc/architecture/design.html
index 3dba759..50bf6f5 100644
--- a/zh-cn/docs/dev/user_doc/architecture/design.html
+++ b/zh-cn/docs/dev/user_doc/architecture/design.html
@@ -181,19 +181,23 @@ Server基于netty提供监听服务。Worker</p>
  </p>
 其中Master监控其他Master和Worker的目录,如果监听到remove事件,则会根据具体的业务逻辑进行流程实例容错或者任务实例容错。
 <ul>
-<li>Master容错流程图:</li>
+<li>Master容错流程:</li>
 </ul>
- <p align="center">
-   <img 
src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant_master.png";
 alt="Master容错流程图"  width="40%" />
+<p align="center">
+   <img src="/img/failover-master.jpg" alt="容错流程"  width="50%" />
  </p>
-ZooKeeper Master容错完成之后则重新由DolphinScheduler中Scheduler线程调度,遍历 DAG 
找到”正在运行”和“提交成功”的任务,对”正在运行”的任务监控其任务实例的状态,对”提交成功”的任务需要判断Task 
Queue中是否已经存在,如果存在则同样监控任务实例的状态,如果不存在则重新提交任务实例。
+<p>容错范围:从host的维度来看,Master的容错范围包括:自身host+注册中心上不存在的节点host,容错的整个过程会加锁;</p>
+<p>容错内容:Master的容错内容包括:容错工作流实例和任务实例,在容错前会比较实例的开始时间和服务节点的启动时间,在服务启动时间之后的则跳过容错;</p>
+<p>容错后处理:ZooKeeper Master容错完成之后则重新由DolphinScheduler中Scheduler线程调度,遍历 DAG 
找到”正在运行”和“提交成功”的任务,对”正在运行”的任务监控其任务实例的状态,对”提交成功”的任务需要判断Task 
Queue中是否已经存在,如果存在则同样监控任务实例的状态,如果不存在则重新提交任务实例。</p>
 <ul>
-<li>Worker容错流程图:</li>
+<li>Worker容错流程:</li>
 </ul>
- <p align="center">
-   <img 
src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant_worker.png";
 alt="Worker容错流程图"  width="40%" />
+<p align="center">
+   <img src="/img/failover-worker.jpg" alt="容错流程"  width="50%" />
  </p>
-<p>Master Scheduler线程一旦发现任务实例为” 需要容错”状态,则接管任务并进行重新提交。</p>
+<p>容错范围:从工作流实例的维度看,每个Master只负责容错自己的工作流实例;只有在<code>handleDeadServer</code>时会加锁;</p>
+<p>容错内容:当发送Worker节点的remove事件时,Master只容错任务实例,在容错前会比较实例的开始时间和服务节点的启动时间,在服务启动时间之后的则跳过容错;</p>
+<p>容错后处理:Master Scheduler线程一旦发现任务实例为” 需要容错”状态,则接管任务并进行重新提交。</p>
 <p>注意:由于” 
网络抖动”可能会使得节点短时间内失去和ZooKeeper的心跳,从而发生节点的remove事件。对于这种情况,我们使用最简单的方式,那就是节点一旦和ZooKeeper发生超时连接,则直接将Master或Worker服务停掉。</p>
 <h6>2.任务失败重试</h6>
 <p>这里首先要区分任务失败重试、流程失败恢复、流程失败重跑的概念:</p>
diff --git a/zh-cn/docs/dev/user_doc/architecture/design.json 
b/zh-cn/docs/dev/user_doc/architecture/design.json
index 36f2936..3a3984f 100644
--- a/zh-cn/docs/dev/user_doc/architecture/design.json
+++ b/zh-cn/docs/dev/user_doc/architecture/design.json
@@ -1,6 +1,6 @@
 {
   "filename": "design.md",
-  "__html": 
"<h2>系统架构设计</h2>\n<p>在对调度系统架构说明之前,我们先来认识一下调度系统常用的名词</p>\n<h3>1.名词解释</h3>\n<p><strong>DAG:</strong>
 全称Directed Acyclic 
Graph,简称DAG。工作流中的Task任务以有向无环图的形式组装起来,从入度为零的节点进行拓扑遍历,直到无后继节点为止。举例如下图:</p>\n<p 
align=\"center\">\n  <img src=\"/img/dag_examples_cn.jpg\" alt=\"dag示例\"  
width=\"60%\" />\n  <p align=\"center\">\n        <em>dag示例</em>\n  
</p>\n</p>\n<p><strong>流程定义</strong>:通过拖拽任务节点并建立任务节点的关联所形成的可视化<strong>DAG</strong></p>\n<p><strong>流程实例</strong>:流程实例是流程定义的实例化,可以通过手动启动或定时调度生成,
 [...]
+  "__html": 
"<h2>系统架构设计</h2>\n<p>在对调度系统架构说明之前,我们先来认识一下调度系统常用的名词</p>\n<h3>1.名词解释</h3>\n<p><strong>DAG:</strong>
 全称Directed Acyclic 
Graph,简称DAG。工作流中的Task任务以有向无环图的形式组装起来,从入度为零的节点进行拓扑遍历,直到无后继节点为止。举例如下图:</p>\n<p 
align=\"center\">\n  <img src=\"/img/dag_examples_cn.jpg\" alt=\"dag示例\"  
width=\"60%\" />\n  <p align=\"center\">\n        <em>dag示例</em>\n  
</p>\n</p>\n<p><strong>流程定义</strong>:通过拖拽任务节点并建立任务节点的关联所形成的可视化<strong>DAG</strong></p>\n<p><strong>流程实例</strong>:流程实例是流程定义的实例化,可以通过手动启动或定时调度生成,
 [...]
   "link": "/dist/zh-cn/docs/dev/user_doc/architecture/design.html",
   "meta": {}
 }
\ No newline at end of file

Reply via email to