This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new d102ab7 Automated deployment: 1315037f5f5c8443d67c9ad96c4f19ad1e933155
d102ab7 is described below
commit d102ab7216ebbb8cb01e3fc4750d54162763d964
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Tue Mar 8 02:43:12 2022 +0000
Automated deployment: 1315037f5f5c8443d67c9ad96c4f19ad1e933155
---
.../About_DolphinScheduler.html | 16 +-
.../About_DolphinScheduler.json | 2 +-
en-us/docs/dev/user_doc/architecture/cache.html | 16 +-
en-us/docs/dev/user_doc/architecture/cache.json | 2 +-
.../dev/user_doc/architecture/configuration.html | 88 ++++-----
.../dev/user_doc/architecture/configuration.json | 2 +-
en-us/docs/dev/user_doc/architecture/design.html | 179 +++++++++---------
en-us/docs/dev/user_doc/architecture/design.json | 2 +-
.../dev/user_doc/architecture/load-balance.html | 34 ++--
.../dev/user_doc/architecture/load-balance.json | 2 +-
en-us/docs/dev/user_doc/architecture/metadata.html | 204 +++++++++++++++------
en-us/docs/dev/user_doc/architecture/metadata.json | 2 +-
.../dev/user_doc/architecture/task-structure.html | 38 ++--
.../dev/user_doc/architecture/task-structure.json | 2 +-
14 files changed, 335 insertions(+), 254 deletions(-)
diff --git
a/en-us/docs/dev/user_doc/About_DolphinScheduler/About_DolphinScheduler.html
b/en-us/docs/dev/user_doc/About_DolphinScheduler/About_DolphinScheduler.html
index 1a378ef..b1c7292 100644
--- a/en-us/docs/dev/user_doc/About_DolphinScheduler/About_DolphinScheduler.html
+++ b/en-us/docs/dev/user_doc/About_DolphinScheduler/About_DolphinScheduler.html
@@ -11,22 +11,22 @@
</head>
<body>
<div id="root"><div class="md2html docs-page" data-reactroot=""><header
class="header-container header-container-dark"><div class="header-body"><span
class="mobile-menu-btn mobile-menu-btn-dark"></span><a
href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div
class="search search-dark"><span class="icon-search"></span></div><span
class="language-switch language-switch-dark">中</span><div
class="header-menu"><div><ul class="ant-menu whiteClass ant-menu-light ant-
[...]
-<p>Apache DolphinScheduler is a cloud-native visual Big Data workflow
scheduler system, committed to “solving complex big-data task dependencies and
triggering relationships in data OPS orchestration so that various types of big
data tasks can be used out of the box”.</p>
-<h2>High Reliability</h2>
+<p>Apache DolphinScheduler is a distributed, easy to extend visual DAG
workflow task scheduling open-source system. Solves the intricate dependencies
of data R&D ETL and the inability to monitor the health status of tasks.
DolphinScheduler assembles tasks in the DAG streaming way, which can monitor
the execution status of tasks in time, and supports operations like retry,
recovery failure from specified nodes, pause, resume and kill tasks, etc.</p>
+<h2>Simple to Use</h2>
<ul>
-<li>Decentralized multi-master and multi-worker, HA is supported by itself,
overload processing</li>
+<li>DolphinScheduler has DAG monitoring user interfaces, users can customize
DAG by dragging and dropping. All process definitions are visualized, supports
rich third-party systems APIs and one-click deployment.</li>
</ul>
-<h2>User-Friendly</h2>
+<h2>High Reliability</h2>
<ul>
-<li>All process definition operations are visualized, Visualization process
defines key information at a glance, One-click deployment</li>
+<li>Decentralized multi-masters and multi-workers, support HA, select queues
to avoid overload.</li>
</ul>
<h2>Rich Scenarios</h2>
<ul>
-<li>Support multi-tenant. Support many task types e.g., spark,flink,hive, mr,
shell, python, sub_process</li>
+<li>Support features like multi-tenants, suspend and resume operations to cope
with big data scenarios. Support many task types like Spark, Flink, Hive, MR,
shell, python, sub_process.</li>
</ul>
-<h2>High Expansibility</h2>
+<h2>High Scalability</h2>
<ul>
-<li>Support custom task types, Distributed scheduling, and the overall
scheduling capability will increase linearly with the scale of the cluster</li>
+<li>Supports customized task types, distributed scheduling, and the overall
scheduling capability increases linearly with the scale of the cluster.</li>
</ul>
</div></section><footer class="footer-container"><div
class="footer-body"><div><h3>About us</h3><h4>Do you need feedback? Please
contact us through the following ways.</h4></div><div
class="contact-container"><ul><li><a
href="/en-us/community/development/subscribe.html"><img class="img-base"
src="/img/emailgray.png"/><img class="img-change"
src="/img/emailblue.png"/><p>Email List</p></a></li><li><a
href="https://twitter.com/dolphinschedule"><img class="img-base"
src="/img/twittergray.png [...]
<script
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-with-addons.min.js"></script>
diff --git
a/en-us/docs/dev/user_doc/About_DolphinScheduler/About_DolphinScheduler.json
b/en-us/docs/dev/user_doc/About_DolphinScheduler/About_DolphinScheduler.json
index bf84fd7..a93dd46 100644
--- a/en-us/docs/dev/user_doc/About_DolphinScheduler/About_DolphinScheduler.json
+++ b/en-us/docs/dev/user_doc/About_DolphinScheduler/About_DolphinScheduler.json
@@ -1,6 +1,6 @@
{
"filename": "About_DolphinScheduler.md",
- "__html": "<h1>About DolphinScheduler</h1>\n<p>Apache DolphinScheduler is a
cloud-native visual Big Data workflow scheduler system, committed to “solving
complex big-data task dependencies and triggering relationships in data OPS
orchestration so that various types of big data tasks can be used out of the
box”.</p>\n<h2>High Reliability</h2>\n<ul>\n<li>Decentralized multi-master and
multi-worker, HA is supported by itself, overload
processing</li>\n</ul>\n<h2>User-Friendly</h2>\n<ul>\n [...]
+ "__html": "<h1>About DolphinScheduler</h1>\n<p>Apache DolphinScheduler is a
distributed, easy to extend visual DAG workflow task scheduling open-source
system. Solves the intricate dependencies of data R&D ETL and the inability
to monitor the health status of tasks. DolphinScheduler assembles tasks in the
DAG streaming way, which can monitor the execution status of tasks in time, and
supports operations like retry, recovery failure from specified nodes, pause,
resume and kill tasks [...]
"link":
"/dist/en-us/docs/dev/user_doc/About_DolphinScheduler/About_DolphinScheduler.html",
"meta": {}
}
\ No newline at end of file
diff --git a/en-us/docs/dev/user_doc/architecture/cache.html
b/en-us/docs/dev/user_doc/architecture/cache.html
index fe14a33..b5fcc1ed 100644
--- a/en-us/docs/dev/user_doc/architecture/cache.html
+++ b/en-us/docs/dev/user_doc/architecture/cache.html
@@ -12,8 +12,8 @@
<body>
<div id="root"><div class="md2html docs-page" data-reactroot=""><header
class="header-container header-container-dark"><div class="header-body"><span
class="mobile-menu-btn mobile-menu-btn-dark"></span><a
href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div
class="search search-dark"><span class="icon-search"></span></div><span
class="language-switch language-switch-dark">中</span><div
class="header-menu"><div><ul class="ant-menu whiteClass ant-menu-light ant-
[...]
<h2>Purpose</h2>
-<p>Due to the master-server scheduling process, there will be a large number
of database read operations, such as <code>tenant</code>, <code>user</code>,
<code>processDefinition</code>, etc. On the one hand, it will put a lot of
pressure on the DB, and on the other hand, it will slow down the entire core
scheduling process.</p>
-<p>Considering that this part of the business data is a scenario where more
reads and less writes are performed, a cache module is introduced to reduce the
DB read pressure and speed up the core scheduling process;</p>
+<p>Due to the large database read operations during the master-server
scheduling process. Such as read tables like <code>tenant</code>,
<code>user</code>, <code>processDefinition</code>, etc. Operations stress read
pressure to the DB, and slow down the entire core scheduling process.</p>
+<p>By considering this part of the business data is a high-read and low-write
scenario, a cache module is introduced to reduce the DB read pressure and speed
up the core scheduling process.</p>
<h2>Cache Settings</h2>
<pre><code class="language-yaml"><span class="hljs-attr">spring:</span>
<span class="hljs-attr">cache:</span>
@@ -28,14 +28,14 @@
<span class="hljs-attr">caffeine:</span>
<span class="hljs-attr">spec:</span> <span
class="hljs-string">maximumSize=100,expireAfterWrite=300s,recordStats</span>
</code></pre>
-<p>The cache-module use <a
href="https://spring.io/guides/gs/caching/">spring-cache</a>, so you can set
cache config in the spring application.yaml directly. Default disable cache,
and you can enable it by <code>type: caffeine</code>.</p>
-<p>With the config of <a
href="https://github.com/ben-manes/caffeine">caffeine</a>, you can set the
cache size, expire time, etc.</p>
+<p>The cache module uses <a
href="https://spring.io/guides/gs/caching/">spring-cache</a>, so you can set
cache config like whether to enable cache (<code>none</code> to disable by
default), cache types in the spring <code>application.yaml</code> directly.</p>
+<p>Currently, implements the config of <a
href="https://github.com/ben-manes/caffeine">caffeine</a>, you can assign cache
configs like cache size, expire time, etc.</p>
<h2>Cache Read</h2>
-<p>The cache adopts the annotation <code>@Cacheable</code> of spring-cache and
is configured in the mapper layer. For example: <code>TenantMapper</code>.</p>
+<p>The cache module adopts the <code>@Cacheable</code> annotation from
spring-cache and you can annotate the annotation in the related mapper layer.
Refer to the <code>TenantMapper</code>.</p>
<h2>Cache Evict</h2>
-<p>The business data update comes from the api-server, and the cache end is in
the master-server. So it is necessary to monitor the data update of the
api-server (aspect intercept <code>@CacheEvict</code>), and the master-server
will be notified when the cache eviction is required.</p>
-<p>It should be noted that the final strategy for cache update comes from the
user's expiration strategy configuration in caffeine, so please configure it in
conjunction with the business;</p>
-<p>The sequence diagram is shown in the following figure:</p>
+<p>The business data updates come from the api-server, and the cache side is
in the master-server. Then it is necessary to monitor the data updates from the
api-server (use aspect point cut interceptor <code>@CacheEvict</code>), and
notify the master-server of <code>cacheEvictCommand</code> when processing a
cache eviction.</p>
+<p>Note: the final strategy for cache update comes from the expiration
strategy configuration in caffeine, therefore configure it under the business
scenarios;</p>
+<p>The sequence diagram shows below:</p>
<img src="/img/cache-evict.png" alt="cache-evict" style="zoom: 67%;"
/></div></section><footer class="footer-container"><div
class="footer-body"><div><h3>About us</h3><h4>Do you need feedback? Please
contact us through the following ways.</h4></div><div
class="contact-container"><ul><li><a
href="/en-us/community/development/subscribe.html"><img class="img-base"
src="/img/emailgray.png"/><img class="img-change"
src="/img/emailblue.png"/><p>Email List</p></a></li><li><a href="https://twitt
[...]
<script
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-with-addons.min.js"></script>
<script
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-dom.min.js"></script>
diff --git a/en-us/docs/dev/user_doc/architecture/cache.json
b/en-us/docs/dev/user_doc/architecture/cache.json
index 1518904..8901f23 100644
--- a/en-us/docs/dev/user_doc/architecture/cache.json
+++ b/en-us/docs/dev/user_doc/architecture/cache.json
@@ -1,6 +1,6 @@
{
"filename": "cache.md",
- "__html": "<h1>Cache</h1>\n<h2>Purpose</h2>\n<p>Due to the master-server
scheduling process, there will be a large number of database read operations,
such as <code>tenant</code>, <code>user</code>, <code>processDefinition</code>,
etc. On the one hand, it will put a lot of pressure on the DB, and on the other
hand, it will slow down the entire core scheduling process.</p>\n<p>Considering
that this part of the business data is a scenario where more reads and less
writes are performed, a [...]
+ "__html": "<h1>Cache</h1>\n<h2>Purpose</h2>\n<p>Due to the large database
read operations during the master-server scheduling process. Such as read
tables like <code>tenant</code>, <code>user</code>,
<code>processDefinition</code>, etc. Operations stress read pressure to the DB,
and slow down the entire core scheduling process.</p>\n<p>By considering this
part of the business data is a high-read and low-write scenario, a cache module
is introduced to reduce the DB read pressure and spe [...]
"link": "/dist/en-us/docs/dev/user_doc/architecture/cache.html",
"meta": {}
}
\ No newline at end of file
diff --git a/en-us/docs/dev/user_doc/architecture/configuration.html
b/en-us/docs/dev/user_doc/architecture/configuration.html
index e2b5532..7e2f26e 100644
--- a/en-us/docs/dev/user_doc/architecture/configuration.html
+++ b/en-us/docs/dev/user_doc/architecture/configuration.html
@@ -15,23 +15,25 @@
<h2>Preface</h2>
<p>This document explains the DolphinScheduler application configurations
according to DolphinScheduler-1.3.x versions.</p>
<h2>Directory Structure</h2>
-<p>Currently, all the configuration files are under [conf ] directory. Please
check the following simplified DolphinScheduler installation directories to
have a direct view about the position [conf] directory in and configuration
files inside. This document only describes DolphinScheduler configurations and
other modules are not going into.</p>
+<p>Currently, all the configuration files are under [conf ] directory.
+Check the following simplified DolphinScheduler installation directories to
have a direct view about the position of [conf] directory and configuration
files it has.
+This document only describes DolphinScheduler configurations and other topics
are not going into.</p>
<p>[Note: the DolphinScheduler (hereinafter called the ‘DS’) .]</p>
<pre><code>├─bin DS application commands
directory
-│ ├─dolphinscheduler-daemon.sh startup/shutdown DS application
-│ ├─start-all.sh A startup all DS services with
configurations
+│ ├─dolphinscheduler-daemon.sh startup or shutdown DS application
+│ ├─start-all.sh startup all DS services with
configurations
│ ├─stop-all.sh shutdown all DS services with
configurations
├─conf configurations directory
│ ├─application-api.properties API-service config properties
│ ├─datasource.properties datasource config properties
│ ├─zookeeper.properties ZooKeeper config properties
-│ ├─master.properties master config properties
-│ ├─worker.properties worker config properties
+│ ├─master.properties master-service config properties
+│ ├─worker.properties worker-service config properties
│ ├─quartz.properties quartz config properties
-│ ├─common.properties common-service[storage] config
properties
+│ ├─common.properties common-service [storage] config
properties
│ ├─alert.properties alert-service config properties
│ ├─config environment variables config directory
-│ ├─install_config.conf DS environment variables
configuration script[install/start DS]
+│ ├─install_config.conf DS environment variables
configuration script [install or start DS]
│ ├─env load environment variables configs
script directory
│ ├─dolphinscheduler_env.sh load environment variables configs
[eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]
│ ├─org mybatis mapper files directory
@@ -40,13 +42,13 @@
│ ├─logback-master.xml master-service log config
│ ├─logback-worker.xml worker-service log config
│ ├─logback-alert.xml alert-service log config
-├─sql DS metadata to create/upgrade .sql
directory
+├─sql .sql files to create or upgrade DS
metadata
│ ├─create create SQL scripts directory
│ ├─upgrade upgrade SQL scripts directory
-│ ├─dolphinscheduler_postgre.sql postgre database init script
-│ ├─dolphinscheduler_mysql.sql mysql database init script
+│ ├─dolphinscheduler_postgre.sql PostgreSQL database init script
+│ ├─dolphinscheduler_mysql.sql MySQL database init script
│ ├─soft_version current DS version-id file
-├─script DS services deployment, database
create/upgrade scripts directory
+├─script DS services deployment, database create or
upgrade scripts directory
│ ├─create-dolphinscheduler.sh DS database init script
│ ├─upgrade-dolphinscheduler.sh DS database upgrade script
│ ├─monitor-server.sh DS monitor-server start script
@@ -68,7 +70,7 @@
<tbody>
<tr>
<td>1</td>
-<td>startup/shutdown DS application</td>
+<td>startup or shutdown DS application</td>
<td><a
href="http://dolphinscheduler-daemon.sh">dolphinscheduler-daemon.sh</a></td>
</tr>
<tr>
@@ -93,12 +95,12 @@
</tr>
<tr>
<td>6</td>
-<td>master config properties</td>
+<td>master-service config properties</td>
<td>master.properties</td>
</tr>
<tr>
<td>7</td>
-<td>worker config properties</td>
+<td>worker-service config properties</td>
<td>worker.properties</td>
</tr>
<tr>
@@ -128,10 +130,10 @@
</tr>
</tbody>
</table>
-<h3><a href="http://dolphinscheduler-daemon.sh">dolphinscheduler-daemon.sh</a>
[startup/shutdown DS application]</h3>
+<h3><a href="http://dolphinscheduler-daemon.sh">dolphinscheduler-daemon.sh</a>
[startup or shutdown DS application]</h3>
<p><a href="http://dolphinscheduler-daemon.sh">dolphinscheduler-daemon.sh</a>
is responsible for DS startup and shutdown.
-Essentially, <a
href="http://start-all.sh/stop-all.sh">start-all.sh/stop-all.sh</a>
startup/shutdown the cluster via <a
href="http://dolphinscheduler-daemon.sh">dolphinscheduler-daemon.sh</a>.
-Currently, DS just makes a basic config, please config further JVM options
based on your practical situation of resources.</p>
+Essentially, <a href="http://start-all.sh">start-all.sh</a> or <a
href="http://stop-all.sh">stop-all.sh</a> startup and shutdown the cluster via
<a href="http://dolphinscheduler-daemon.sh">dolphinscheduler-daemon.sh</a>.
+Currently, DS just makes a basic config, remember to config further JVM
options based on your practical situation of resources.</p>
<p>Default simplified parameters are:</p>
<pre><code class="language-bash"><span class="hljs-built_in">export</span>
DOLPHINSCHEDULER_OPTS=<span class="hljs-string">"
-server
@@ -182,7 +184,7 @@ Currently, DS just makes a basic config, please config
further JVM options bas
<tr>
<td>spring.datasource.initialSize</td>
<td>5</td>
-<td>initail connection pool size number</td>
+<td>initial connection pool size number</td>
</tr>
<tr>
<td>spring.datasource.minIdle</td>
@@ -197,7 +199,7 @@ Currently, DS just makes a basic config, please config
further JVM options bas
<tr>
<td>spring.datasource.maxWait</td>
<td>60000</td>
-<td>max wait mili-seconds</td>
+<td>max wait milliseconds</td>
</tr>
<tr>
<td>spring.datasource.timeBetweenEvictionRunsMillis</td>
@@ -252,7 +254,7 @@ Currently, DS just makes a basic config, please config
further JVM options bas
<tr>
<td>spring.datasource.poolPreparedStatements</td>
<td>true</td>
-<td>Open PSCache</td>
+<td>open PSCache</td>
</tr>
<tr>
<td>spring.datasource.maxPoolPreparedStatementPerConnectionSize</td>
@@ -309,7 +311,7 @@ Currently, DS just makes a basic config, please config
further JVM options bas
</tbody>
</table>
<h3>common.properties [hadoop、s3、yarn config properties]</h3>
-<p>Currently, common.properties mainly configures hadoop/s3a related
configurations.</p>
+<p>Currently, common.properties mainly configures Hadoop,s3a related
configurations.</p>
<table>
<thead>
<tr>
@@ -372,7 +374,7 @@ Currently, DS just makes a basic config, please config
further JVM options bas
<tr>
<td>fs.defaultFS</td>
<td>hdfs://mycluster:8020</td>
-<td>If resource.storage.type=S3, then the request url would be similar to
's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop
supports HA, please copy core-site.xml and hdfs-site.xml into 'conf'
directory</td>
+<td>If resource.storage.type=S3, then the request url would be similar to
's3a://dolphinscheduler'. Otherwise if resource.storage.type=HDFS and hadoop
supports HA, copy core-site.xml and hdfs-site.xml into 'conf' directory</td>
</tr>
<tr>
<td>fs.s3a.endpoint</td>
@@ -397,7 +399,7 @@ Currently, DS just makes a basic config, please config
further JVM options bas
<tr>
<td>yarn.application.status.address</td>
<td><a
href="http://ds1:8088/ws/v1/cluster/apps/%25s">http://ds1:8088/ws/v1/cluster/apps/%s</a></td>
-<td>keep default if resourcemanager supports HA or not use resourcemanager. Or
replace ds1 with corresponding hostname if resourcemanager in standalone
mode</td>
+<td>keep default if ResourceManager supports HA or not use ResourceManager, or
replace ds1 with corresponding hostname if ResourceManager in standalone
mode</td>
</tr>
<tr>
<td>dolphinscheduler.env.path</td>
@@ -491,22 +493,22 @@ Currently, DS just makes a basic config, please config
further JVM options bas
<tr>
<td>master.exec.threads</td>
<td>100</td>
-<td>master execute thread number to limit process instances in parallel</td>
+<td>master-service execute thread number, used to limit the number of process
instances in parallel</td>
</tr>
<tr>
<td>master.exec.task.num</td>
<td>20</td>
-<td>master execute task number in parallel per process instance</td>
+<td>defines the number of parallel tasks for each process instance of the
master-service</td>
</tr>
<tr>
<td>master.dispatch.task.num</td>
<td>3</td>
-<td>master dispatch task number per batch</td>
+<td>defines the number of dispatch tasks for each batch of the
master-service</td>
</tr>
<tr>
<td>master.host.selector</td>
<td>LowerWeight</td>
-<td>master host selector to select a suitable worker, default value:
LowerWeight. Optional values include Random, RoundRobin, LowerWeight</td>
+<td>master host selector, to select a suitable worker to run the task,
optional value: random, round-robin, lower weight</td>
</tr>
<tr>
<td>master.heartbeat.interval</td>
@@ -548,17 +550,17 @@ Currently, DS just makes a basic config, please config
further JVM options bas
<tr>
<td>worker.listen.port</td>
<td>1234</td>
-<td>worker listen port</td>
+<td>worker-service listen port</td>
</tr>
<tr>
<td>worker.exec.threads</td>
<td>100</td>
-<td>worker execute thread number to limit task instances in parallel</td>
+<td>worker-service execute thread number, used to limit the number of task
instances in parallel</td>
</tr>
<tr>
<td>worker.heartbeat.interval</td>
<td>10</td>
-<td>worker heartbeat interval, the unit is second</td>
+<td>worker-service heartbeat interval, the unit is second</td>
</tr>
<tr>
<td>worker.max.cpuload.avg</td>
@@ -573,7 +575,7 @@ Currently, DS just makes a basic config, please config
further JVM options bas
<tr>
<td>worker.groups</td>
<td>default</td>
-<td>worker groups separated by comma, like 'worker.groups=default,test' <br>
worker will join corresponding group according to this config when startup</td>
+<td>worker groups separated by comma, e.g., 'worker.groups=default,test' <br>
worker will join corresponding group according to this config when startup</td>
</tr>
</tbody>
</table>
@@ -700,7 +702,7 @@ Currently, DS just makes a basic config, please config
further JVM options bas
</tbody>
</table>
<h3>quartz.properties [quartz config properties]</h3>
-<p>This part describes quartz configs and please configure them based on your
practical situation and resources.</p>
+<p>This part describes quartz configs and configure them based on your
practical situation and resources.</p>
<table>
<thead>
<tr>
@@ -802,20 +804,20 @@ Currently, DS just makes a basic config, please config
further JVM options bas
</tr>
</tbody>
</table>
-<h3>install_config.conf [DS environment variables configuration
script[install/start DS]]</h3>
+<h3>install_config.conf [DS environment variables configuration script[install
or start DS]]</h3>
<p>install_config.conf is a bit complicated and is mainly used in the
following two places.</p>
<ul>
-<li>DS Cluster Auto Installation</li>
+<li>DS Cluster Auto Installation.</li>
</ul>
<blockquote>
<p>System will load configs in the install_config.conf and auto-configure
files below, based on the file content when executing '<a
href="http://install.sh">install.sh</a>'.
-Files such as <a
href="http://dolphinscheduler-daemon.sh">dolphinscheduler-daemon.sh</a>、datasource.properties、zookeeper.properties、common.properties、application-api.properties、master.properties、worker.properties、alert.properties、quartz.properties
and etc.</p>
+Files such as <a
href="http://dolphinscheduler-daemon.sh">dolphinscheduler-daemon.sh</a>,
datasource.properties, zookeeper.properties, common.properties,
application-api.properties, master.properties, worker.properties,
alert.properties, quartz.properties, etc.</p>
</blockquote>
<ul>
-<li>Startup/Shutdown DS Cluster</li>
+<li>Startup and Shutdown DS Cluster.</li>
</ul>
<blockquote>
-<p>The system will load masters, workers, alertServer, apiServers and other
parameters inside the file to startup/shutdown DS cluster.</p>
+<p>The system will load masters, workers, alert-server, API-servers and other
parameters inside the file to startup or shutdown DS cluster.</p>
</blockquote>
<h4>File Content as Follows:</h4>
<pre><code class="language-bash">
@@ -845,7 +847,7 @@ zkQuorum=<span
class="hljs-string">"192.168.xx.xx:2181,192.168.xx.xx:2181,1
installPath=<span
class="hljs-string">"/data1_1T/dolphinscheduler"</span>
<span class="hljs-comment"># Deployment user</span>
-<span class="hljs-comment"># Note: Deployment user needs 'sudo'
privilege and has rights to operate HDFS</span>
+<span class="hljs-comment"># Note: Deployment user needs 'sudo'
privilege and has rights to operate HDFS.</span>
<span class="hljs-comment"># Root directory must be created by the same
user if using HDFS, otherwise permission related issues will be raised.</span>
deployUser=<span class="hljs-string">"dolphinscheduler"</span>
@@ -866,16 +868,16 @@ mailUser=<span
class="hljs-string">"xxxxxxxxxx"</span>
<span class="hljs-comment"># Mail password</span>
mailPassword=<span class="hljs-string">"xxxxxxxxxx"</span>
-<span class="hljs-comment"># Mail supports TLS set true if not set false</span>
+<span class="hljs-comment"># Whether mail supports TLS</span>
starttlsEnable=<span class="hljs-string">"true"</span>
-<span class="hljs-comment"># Mail supports SSL set true if not set false.
Note: starttlsEnable and sslEnable cannot both set true</span>
+<span class="hljs-comment"># Whether mail supports SSL. Note: starttlsEnable
and sslEnable cannot both set true.</span>
sslEnable=<span class="hljs-string">"false"</span>
<span class="hljs-comment"># Mail server host, same as mailServerHost</span>
sslTrust=<span class="hljs-string">"smtp.exmail.qq.com"</span>
-<span class="hljs-comment"># Specify which resource upload function to use for
resources storage such as sql files. And supported options are HDFS, S3 and
NONE. HDFS for upload to HDFS and NONE for not using this function.</span>
+<span class="hljs-comment"># Specify which resource upload function to use for
resources storage, such as sql files. And supported options are HDFS, S3 and
NONE. HDFS for upload to HDFS and NONE for not using this function.</span>
resourceStorageType=<span class="hljs-string">"NONE"</span>
<span class="hljs-comment"># if S3, write S3 address. HA, for example:
s3a://dolphinscheduler,</span>
@@ -903,7 +905,7 @@ hdfsRootUser=<span
class="hljs-string">"hdfs"</span>
<span class="hljs-comment"># Followings are Kerberos configs</span>
-<span class="hljs-comment"># Spicify Kerberos enable or not</span>
+<span class="hljs-comment"># Specify Kerberos enable or not</span>
kerberosStartUp=<span class="hljs-string">"false"</span>
<span class="hljs-comment"># Kdc krb5 config file path</span>
@@ -941,7 +943,7 @@ apiServers=<span class="hljs-string">"ds1"</span>
</code></pre>
<h3>dolphinscheduler_env.sh [load environment variables configs]</h3>
<p>When using shell to commit tasks, DS will load environment variables inside
dolphinscheduler_env.sh into the host.
-Types of tasks involved are: Shell task、Python task、Spark task、Flink
task、Datax task and etc.</p>
+Types of tasks involved are: Shell, Python, Spark, Flink, DataX, etc.</p>
<pre><code class="language-bash"><span class="hljs-built_in">export</span>
HADOOP_HOME=/opt/soft/hadoop
<span class="hljs-built_in">export</span>
HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
<span class="hljs-built_in">export</span> SPARK_HOME1=/opt/soft/spark1
diff --git a/en-us/docs/dev/user_doc/architecture/configuration.json
b/en-us/docs/dev/user_doc/architecture/configuration.json
index 5fd2955..4fa12a7 100644
--- a/en-us/docs/dev/user_doc/architecture/configuration.json
+++ b/en-us/docs/dev/user_doc/architecture/configuration.json
@@ -1,6 +1,6 @@
{
"filename": "configuration.md",
- "__html": "<!-- markdown-link-check-disable
-->\n<h1>Configuration</h1>\n<h2>Preface</h2>\n<p>This document explains the
DolphinScheduler application configurations according to DolphinScheduler-1.3.x
versions.</p>\n<h2>Directory Structure</h2>\n<p>Currently, all the
configuration files are under [conf ] directory. Please check the following
simplified DolphinScheduler installation directories to have a direct view
about the position [conf] directory in and configuration files inside. [...]
+ "__html": "<!-- markdown-link-check-disable
-->\n<h1>Configuration</h1>\n<h2>Preface</h2>\n<p>This document explains the
DolphinScheduler application configurations according to DolphinScheduler-1.3.x
versions.</p>\n<h2>Directory Structure</h2>\n<p>Currently, all the
configuration files are under [conf ] directory.\nCheck the following
simplified DolphinScheduler installation directories to have a direct view
about the position of [conf] directory and configuration files it has.\nThis
[...]
"link": "/dist/en-us/docs/dev/user_doc/architecture/configuration.html",
"meta": {}
}
\ No newline at end of file
diff --git a/en-us/docs/dev/user_doc/architecture/design.html
b/en-us/docs/dev/user_doc/architecture/design.html
index 7f556d1..98f4e5b 100644
--- a/en-us/docs/dev/user_doc/architecture/design.html
+++ b/en-us/docs/dev/user_doc/architecture/design.html
@@ -11,26 +11,26 @@
</head>
<body>
<div id="root"><div class="md2html docs-page" data-reactroot=""><header
class="header-container header-container-dark"><div class="header-body"><span
class="mobile-menu-btn mobile-menu-btn-dark"></span><a
href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div
class="search search-dark"><span class="icon-search"></span></div><span
class="language-switch language-switch-dark">中</span><div
class="header-menu"><div><ul class="ant-menu whiteClass ant-menu-light ant-
[...]
-<p>Before explaining the architecture of the scheduling system, let's first
understand the commonly used terms of the scheduling system</p>
+<p>Before explain the architecture of the scheduling system, let's first get
to know the terms commonly used in scheduling system.</p>
<h2>Glossary</h2>
-<p><strong>DAG:</strong> The full name is Directed Acyclic Graph, referred to
as DAG. Task tasks in the workflow are assembled in the form of a directed
acyclic graph, and topological traversal is performed from nodes with zero
degrees of entry until there are no subsequent nodes. Examples are as
follows:</p>
+<p><strong>DAG:</strong> The full name is Directed Acyclic Graph, the
abbreviation is DAG. Tasks in the workflow are assembled in the form of a
directed acyclic graph, and topological traversal performs from zero degree
entry nodes until there are no subsequent nodes. Examples are as follows:</p>
<p align="center">
<img src="/img/dag_examples_cn.jpg" alt="dag example" width="60%" />
<p align="center">
<em>dag example</em>
</p>
</p>
-<p><strong>Process definition</strong>: Visualization formed by dragging task
nodes and establishing task node associations<strong>DAG</strong></p>
-<p><strong>Process instance</strong>: The process instance is the
instantiation of the process definition, which can be generated by manual start
or scheduled scheduling. Each time the process definition runs, a process
instance is generated</p>
-<p><strong>Task instance</strong>: The task instance is the instantiation of
the task node in the process definition, which identifies the specific task
execution status</p>
-<p><strong>Task type</strong>: Currently supports SHELL, SQL, SUB_PROCESS
(sub-process), PROCEDURE, MR, SPARK, PYTHON, DEPENDENT (depends), and plans to
support dynamic plug-in expansion, note: <strong>SUB_PROCESS</strong> It is
also a separate process definition that can be started and executed
separately</p>
-<p><strong>Scheduling method</strong>: The system supports scheduled
scheduling and manual scheduling based on cron expressions. Command type
support: start workflow, start execution from current node, resume
fault-tolerant workflow, resume pause process, start execution from failed
node, complement, timing, rerun, pause, stop, resume waiting thread. Among them
<strong>Resume fault-tolerant workflow</strong> and <strong>Resume waiting
thread</strong> The two command types are used by the [...]
-<p><strong>Scheduled</strong>: System adopts <strong>quartz</strong>
distributed scheduler, and supports the visual generation of cron
expressions</p>
-<p><strong>Rely</strong>: The system not only supports <strong>DAG</strong>
simple dependencies between the predecessor and successor nodes, but also
provides <strong>task dependent</strong> nodes, supporting <strong>between
processes</strong></p>
-<p><strong>Priority</strong>: Support the priority of process instances and
task instances, if the priority of process instances and task instances is not
set, the default is first-in-first-out</p>
-<p><strong>Email alert</strong>: Support <strong>SQL task</strong> Query
result email sending, process instance running result email alert and fault
tolerance alert notification</p>
-<p><strong>Failure strategy</strong>: For tasks running in parallel, if a task
fails, two failure strategy processing methods are provided.
<strong>Continue</strong> refers to regardless of the status of the task
running in parallel until the end of the process failure. <strong>End</strong>
means that once a failed task is found, Kill will also run the parallel task at
the same time, and the process fails and ends</p>
-<p><strong>Complement</strong>: Supplement historical data,Supports
<strong>interval parallel and serial</strong> two complement methods</p>
+<p><strong>Process definition</strong>: A visualized <strong>DAG</strong>
formed by association of task nodes which is created by dragging and
dropping.</p>
+<p><strong>Process instance</strong>: The process instance is the
instantiation of a process definition, which can be generated by manual start
or scheduled scheduling. A process instance generates by everytime process
definition runs.</p>
+<p><strong>Task instance</strong>: The task instance is the instantiation of
the task node in the process definition, which identifies specific task
execution status.</p>
+<p><strong>Task type</strong>: Currently supports shell, SQL, SUB_PROCESS
(sub-process), PROCEDURE, MR, SPARK, PYTHON, DEPENDENT (dependent), and plans
to support dynamic plug-in extension. Note: <strong>SUB_PROCESS</strong> is
also an individual process definition that can be start and execute
separately.</p>
+<p><strong>Scheduling method</strong>: The system supports cron expressions
based scheduling and manual scheduling. Command type support: start workflow,
start execution from current node, resume fault-tolerant workflow, resume pause
process, start execution from failed node, complement, timing, rerun, pause,
stop, resume waiting thread. Among them <strong>Resume fault-tolerant
workflow</strong> and <strong>Resume waiting thread</strong> these command
types are controlled by the internal [...]
+<p><strong>Scheduled</strong>: System adopts <strong>quartz</strong>
distributed scheduler, and supports the visual generation of cron
expressions.</p>
+<p><strong>Rely</strong>: The system not only supports <strong>DAG</strong>
simple dependencies between the predecessor and successor nodes, but also
provides <strong>task dependent</strong> nodes, supporting <strong>dependencies
between customized tasks of processes</strong>.</p>
+<p><strong>Priority</strong>: Support the priority of process instances and
task instances. By default, the priority is first-in-first-out.</p>
+<p><strong>Email alert</strong>: Support send <strong>SQL task</strong> query
result email, process instance execution result email alert and fault tolerance
alert notification.</p>
+<p><strong>Failure strategy</strong>: For parallel tasks, if a task fails,
there are two failure strategy remedy. <strong>Continue</strong> refers to
regardless of the status of the task running in parallel until the end of the
process failure. <strong>End</strong> means that once a task failed, kill the
parallel task, and the process has a failure result and end.</p>
+<p><strong>Complement</strong>: Complement historical data, Supports
<strong>interval parallel and serial</strong> two complement methods.</p>
<h2>System Structure</h2>
<h3>System Architecture Diagram</h3>
<p align="center">
@@ -50,22 +50,22 @@
<ul>
<li>
<p><strong>MasterServer</strong></p>
-<p>MasterServer adopts a distributed and centerless design concept.
MasterServer is mainly responsible for DAG task segmentation, task submission
monitoring, and monitoring the health status of other MasterServer and
WorkerServer at the same time.
+<p>MasterServer adopts a distributed and decentralized design concept.
MasterServer is mainly responsible for DAG task segmentation, task submission
monitoring, and monitoring the health status of other MasterServer and
WorkerServer at the same time.
When the MasterServer service starts, register a temporary node with
ZooKeeper, and perform fault tolerance by monitoring changes in the temporary
node of ZooKeeper.
MasterServer provides monitoring services based on netty.</p>
<h4>The Service Mainly Includes:</h4>
<ul>
<li>
-<p><strong>Distributed Quartz</strong> distributed scheduling component, which
is mainly responsible for the start and stop operations of scheduled tasks.
When Quartz starts the task, there will be a thread pool inside the Master that
is specifically responsible for the follow-up operation of the processing
task</p>
+<p><strong>Distributed Quartz</strong> distributed scheduling component, which
is mainly responsible for the start and stop operations of schedule tasks. When
Quartz starts the task, there will be a thread pool inside the Master
responsible for the follow-up operation of the processing task.</p>
</li>
<li>
-<p><strong>MasterSchedulerThread</strong> is a scanning thread that regularly
scans the <strong>command</strong> table in the database and performs different
business operations according to different <strong>command types</strong></p>
+<p><strong>MasterSchedulerThread</strong> is a scanning thread that regularly
scans the <strong>command</strong> table in the database and runs different
business operations according to different <strong>command types</strong>.</p>
</li>
<li>
-<p><strong>MasterExecThread</strong> is mainly responsible for DAG task
segmentation, task submission monitoring, and logical processing of various
command types</p>
+<p><strong>MasterExecThread</strong> is mainly responsible for DAG task
segmentation, task submission monitoring, and logical processing to different
command types.</p>
</li>
<li>
-<p><strong>MasterTaskExecThread</strong> is mainly responsible for the
persistence of tasks</p>
+<p><strong>MasterTaskExecThread</strong> is mainly responsible for the
persistence to tasks.</p>
</li>
</ul>
</li>
@@ -73,20 +73,20 @@ MasterServer provides monitoring services based on
netty.</p>
<p><strong>WorkerServer</strong></p>
<p>WorkerServer also adopts a distributed and decentralized design concept.
WorkerServer is mainly responsible for task execution and providing log
services.</p>
<p>When the WorkerServer service starts, register a temporary node with
ZooKeeper and maintain a heartbeat.
-Server provides monitoring services based on netty. Worker</p>
+Server provides monitoring services based on netty.</p>
<h4>The Service Mainly Includes:</h4>
<ul>
-<li><strong>Fetch TaskThread</strong> is mainly responsible for continuously
getting tasks from <strong>Task Queue</strong>, and calling
<strong>TaskScheduleThread</strong> corresponding executor according to
different task types.</li>
+<li><strong>Fetch TaskThread</strong> is mainly responsible for continuously
getting tasks from the <strong>Task Queue</strong>, and calling
<strong>TaskScheduleThread</strong> corresponding executor according to
different task types.</li>
</ul>
</li>
<li>
<p><strong>ZooKeeper</strong></p>
-<p>ZooKeeper service, MasterServer and WorkerServer nodes in the system all
use ZooKeeper for cluster management and fault tolerance. In addition, the
system is based on ZooKeeper for event monitoring and distributed locks.</p>
-<p>We have also implemented queues based on Redis, but we hope that
DolphinScheduler depends on as few components as possible, so we finally
removed the Redis implementation.</p>
+<p>ZooKeeper service, MasterServer and WorkerServer nodes in the system all
use ZooKeeper for cluster management and fault tolerance. In addition, the
system implements event monitoring and distributed locks based on ZooKeeper.</p>
+<p>We have also implemented queues based on Redis, but we hope
DolphinScheduler depends on as few components as possible, so we finally
removed the Redis implementation.</p>
</li>
<li>
<p><strong>Task Queue</strong></p>
-<p>Provide task queue operation, the current queue is also implemented based
on ZooKeeper. Because there is less information stored in the queue, there is
no need to worry about too much data in the queue. In fact, we have tested the
millions of data storage queues, which has no impact on system stability and
performance.</p>
+<p>Provide task queue operation, the current queue is also implement base on
ZooKeeper. Due to little information stored in the queue, there is no need to
worry about excessive data in the queue. In fact, we have tested the millions
of data storage in queues, which has no impact on system stability and
performance.</p>
</li>
<li>
<p><strong>Alert</strong></p>
@@ -94,28 +94,28 @@ Server provides monitoring services based on netty.
Worker</p>
</li>
<li>
<p><strong>API</strong></p>
-<p>The API interface layer is mainly responsible for processing requests from
the front-end UI layer. The service uniformly provides RESTful APIs to provide
request services to the outside world. Interfaces include workflow creation,
definition, query, modification, release, logoff, manual start, stop, pause,
resume, start execution from the node and so on.</p>
+<p>The API interface layer is mainly responsible for processing requests from
the front-end UI layer. The service uniformly provides RESTful APIs to provide
request services to external.
+Interfaces include workflow creation, definition, query, modification,
release, logoff, manual start, stop, pause, resume, start execution from
specific node, etc.</p>
</li>
<li>
<p><strong>UI</strong></p>
-<p>The front-end page of the system provides various visual operation
interfaces of the system,See more
-at <a href="../guide/homepage.md">Introduction to Functions</a> section。</p>
+<p>The front-end page of the system provides various visual operation
interfaces of the system, see more at <a
href="../guide/homepage.md">Introduction to Functions</a> section.</p>
</li>
</ul>
<h3>Architecture Design Ideas</h3>
<h4>Decentralization VS Centralization</h4>
<h5>Centralized Thinking</h5>
-<p>The centralized design concept is relatively simple. The nodes in the
distributed cluster are divided into roles according to roles, which are
roughly divided into two roles:</p>
+<p>The centralized design concept is relatively simple. The nodes in the
distributed cluster are roughly divided into two roles according to
responsibilities:</p>
<p align="center">
<img
src="https://analysys.github.io/easyscheduler_docs_cn/images/master_slave.png"
alt="master-slave character" width="50%" />
</p>
<ul>
-<li>The role of the master is mainly responsible for task distribution and
monitoring the health status of the slave, and can dynamically balance the task
to the slave, so that the slave node will not be in a "busy dead" or
"idle dead" state.</li>
-<li>The role of Worker is mainly responsible for task execution and
maintenance and Master's heartbeat, so that Master can assign tasks to
Slave.</li>
+<li>The role of the master is mainly responsible for task distribution and
monitoring the health status of the slave, and can dynamically balance the task
to the slave, so that the slave node won't be in a "busy dead" or
"idle dead" state.</li>
+<li>The role of Worker is mainly responsible for task execution and heartbeat
maintenance to the Master, so that Master can assign tasks to Slave.</li>
</ul>
<p>Problems in centralized thought design:</p>
<ul>
-<li>Once there is a problem with the Master, the dragons are headless and the
entire cluster will collapse. In order to solve this problem, most of the
Master/Slave architecture models adopt the design scheme of active and standby
Master, which can be hot standby or cold standby, or automatic switching or
manual switching, and more and more new systems are beginning to have The
ability to automatically elect and switch Master to improve the availability of
the system.</li>
+<li>Once there is a problem with the Master, the team grow aimless without
commander and the entire cluster collapse. In order to solve this problem, most
of the Master and Slave architecture models adopt the design scheme of active
and standby Master, which can be hot standby or cold standby, or automatic
switching or manual switching. More and more new systems are beginning to have
ability to automatically elect and switch Master to improve the availability of
the system.</li>
<li>Another problem is that if the Scheduler is on the Master, although it can
support different tasks in a DAG running on different machines, it will cause
the Master to be overloaded. If the Scheduler is on the slave, all tasks in a
DAG can only submit jobs on a certain machine. When there are more parallel
tasks, the pressure on the slave may be greater.</li>
</ul>
<h5>Decentralized</h5>
@@ -123,21 +123,13 @@ at <a href="../guide/homepage.md">Introduction to
Functions</a> section。</p>
<img
src="https://analysys.github.io/easyscheduler_docs_cn/images/decentralization.png"
alt="Decentralization" width="50%" />
</p>
<ul>
-<li>
-<p>In the decentralized design, there is usually no concept of Master/Slave,
all roles are the same, the status is equal, the global Internet is a typical
decentralized distributed system, any node equipment connected to the network
is down, All will only affect a small range of functions.</p>
-</li>
-<li>
-<p>The core design of decentralized design is that there is no
"manager" different from other nodes in the entire distributed
system, so there is no single point of failure. However, because there is no
"manager" node, each node needs to communicate with other nodes to
obtain the necessary machine information, and the unreliability of distributed
system communication greatly increases the difficulty of implementing the above
functions.</p>
-</li>
-<li>
-<p>In fact, truly decentralized distributed systems are rare. Instead, dynamic
centralized distributed systems are constantly pouring out. Under this
architecture, the managers in the cluster are dynamically selected, rather than
preset, and when the cluster fails, the nodes of the cluster will automatically
hold "meetings" to elect new "managers" To preside over the
work. The most typical case is Etcd implemented by ZooKeeper and Go
language.</p>
-</li>
-<li>
-<p>The decentralization of DolphinScheduler is that the Master/Worker is
registered in ZooKeeper, and the Master cluster and Worker cluster are
centerless, and the ZooKeeper distributed lock is used to elect one of the
Master or Worker as the "manager" to perform the task.</p>
-</li>
+<li>In the decentralized design, there is usually no concept of Master or
Slave. All roles are the same, the status is equal, the global Internet is a
typical decentralized distributed system. Any node connected to the network
goes down, will only affect a small range of functions.</li>
+<li>The core design of decentralized design is that there is no distinct
"manager" different from other nodes in the entire distributed
system, so there is no single point failure. However, because there is no
"manager" node, each node needs to communicate with other nodes to
obtain the necessary machine information, and the unreliability of distributed
system communication greatly increases the difficulty to implement the above
functions.</li>
+<li>In fact, truly decentralized distributed systems are rare. Instead,
dynamic centralized distributed systems are constantly pouring out. Under this
architecture, the managers in the cluster are dynamically selected, rather than
preset, and when the cluster fails, the nodes of the cluster will automatically
hold "meetings" to elect new "managers" To preside over the
work. The most typical case is Etcd implemented by ZooKeeper and Go
language.</li>
+<li>The decentralization of DolphinScheduler is that the Master and Worker
register in ZooKeeper, for implement the centerless feature to Master cluster
and Worker cluster. Use the ZooKeeper distributed lock to elect one of the
Master or Worker as the "manager" to perform the task.</li>
</ul>
<h4>Distributed Lock Practice</h4>
-<p>DolphinScheduler uses ZooKeeper distributed lock to realize that only one
Master executes Scheduler at the same time, or only one Worker executes the
submission of tasks.</p>
+<p>DolphinScheduler uses ZooKeeper distributed lock to implement only one
Master executes Scheduler at the same time, or only one Worker executes the
submission of tasks.</p>
<ol>
<li>The core process algorithm for acquiring distributed locks is as
follows:</li>
</ol>
@@ -145,45 +137,45 @@ at <a href="../guide/homepage.md">Introduction to
Functions</a> section。</p>
<img
src="https://analysys.github.io/easyscheduler_docs_cn/images/distributed_lock.png"
alt="Obtain distributed lock process" width="50%" />
</p>
<ol start="2">
-<li>Flow chart of implementation of Scheduler thread distributed lock in
DolphinScheduler:</li>
+<li>Flow diagram of implementation of Scheduler thread distributed lock in
DolphinScheduler:</li>
</ol>
<p align="center">
<img src="/img/distributed_lock_procss.png" alt="Obtain distributed lock
process" width="50%" />
</p>
<h4>Insufficient Thread Loop Waiting Problem</h4>
<ul>
-<li>If there is no sub-process in a DAG, if the number of data in the Command
is greater than the threshold set by the thread pool, the process directly
waits or fails.</li>
-<li>If many sub-processes are nested in a large DAG, the following figure will
produce a "dead" state:</li>
+<li>If there is no sub-process in a DAG, when the number of data in the
Command is greater than the threshold set by the thread pool, the process
directly waits or fails.</li>
+<li>If a large DAG nests many sub-processes, there will produce a
"dead" state as the following figure:</li>
</ul>
<p align="center">
<img
src="https://analysys.github.io/easyscheduler_docs_cn/images/lack_thread.png"
alt="Insufficient threads waiting loop problem" width="50%" />
</p>
-In the above figure, MainFlowThread waits for the end of SubFlowThread1,
SubFlowThread1 waits for the end of SubFlowThread2, SubFlowThread2 waits for
the end of SubFlowThread3, and SubFlowThread3 waits for a new thread in the
thread pool, then the entire DAG process cannot end, so that the threads cannot
be released. In this way, the state of the child-parent process loop waiting is
formed. At this time, unless a new Master is started to add threads to break
such a "stalemate", the sched [...]
+In the above figure, MainFlowThread waits for the end of SubFlowThread1,
SubFlowThread1 waits for the end of SubFlowThread2, SubFlowThread2 waits for
the end of SubFlowThread3, and SubFlowThread3 waits for a new thread in the
thread pool, then the entire DAG process cannot finish, and the threads cannot
be released. In this situation, the state of the child-parent process loop
waiting is formed. At this moment, unless a new Master is started and add
threads to break such a "stalemate", t [...]
<p>It seems a bit unsatisfactory to start a new Master to break the deadlock,
so we proposed the following three solutions to reduce this risk:</p>
<ol>
-<li>Calculate the sum of all Master threads, and then calculate the number of
threads required for each DAG, that is, pre-calculate before the DAG process is
executed. Because it is a multi-master thread pool, the total number of threads
is unlikely to be obtained in real time.</li>
-<li>Judge the single-master thread pool. If the thread pool is full, let the
thread fail directly.</li>
+<li>Calculate the sum of all Master threads, and then calculate the number of
threads required for each DAG, that is, pre-calculate before the DAG process
executes. Because it is a multi-master thread pool, it is unlikely to obtain
the total number of threads in real time.</li>
+<li>Judge whether the single-master thread pool is full, let the thread fail
directly when fulfilled.</li>
<li>Add a Command type with insufficient resources. If the thread pool is
insufficient, suspend the main process. In this way, there are new threads in
the thread pool, which can make the process suspended by insufficient resources
wake up to execute again.</li>
</ol>
-<p>note: The Master Scheduler thread is executed by FIFO when acquiring the
Command.</p>
-<p>So we chose the third way to solve the problem of insufficient threads.</p>
+<p>Note: The Master Scheduler thread executes by FIFO when acquiring the
Command.</p>
+<p>So we choose the third way to solve the problem of insufficient threads.</p>
<h4>Fault-Tolerant Design</h4>
-<p>Fault tolerance is divided into service downtime fault tolerance and task
retry, and service downtime fault tolerance is divided into master fault
tolerance and worker fault tolerance.</p>
+<p>Fault tolerance divides into service downtime fault tolerance and task
retry, and service downtime fault tolerance divides into master fault tolerance
and worker fault tolerance.</p>
<h5>Downtime Fault Tolerance</h5>
-<p>The service fault-tolerance design relies on ZooKeeper's Watcher mechanism,
and the implementation principle is shown in the figure:</p>
+<p>The service fault-tolerance design relies on ZooKeeper's Watcher mechanism,
and the implementation principle shows in the figure:</p>
<p align="center">
<img
src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant.png"
alt="DolphinScheduler fault-tolerant design" width="40%" />
</p>
-Among them, the Master monitors the directories of other Masters and Workers.
If the remove event is heard, fault tolerance of the process instance or task
instance will be performed according to the specific business logic.
+Among them, the Master monitors the directories of other Masters and Workers.
If the remove event is triggered, perform fault tolerance of the process
instance or task instance according to the specific business logic.
<ul>
<li>Master fault tolerance:</li>
</ul>
<p align="center">
<img src="/img/failover-master.jpg" alt="failover-master" width="50%" />
</p>
-<p>Fault tolerance range: From the perspective of host, the fault tolerance
range of Master includes: own host + node host that does not exist in the
registry, and the entire process of fault tolerance will be locked;</p>
-<p>Fault-tolerant content: Master's fault-tolerant content includes:
fault-tolerant process instances and task instances. Before fault-tolerant, it
compares the start time of the instance with the server start-up time, and
skips fault-tolerance if after the server start time;</p>
-<p>Fault-tolerant post-processing: After the fault tolerance of ZooKeeper
Master is completed, it is re-scheduled by the Scheduler thread in
DolphinScheduler, traverses the DAG to find the "running" and
"submit successful" tasks, monitors the status of its task instances
for the "running" tasks, and "commits successful" tasks It
is necessary to determine whether the task queue already exists. If it exists,
the status of the task instance is also mo [...]
+<p>Fault tolerance range: From the perspective of host, the fault tolerance
range of Master includes: own host and node host that does not exist in the
registry, and the entire process of fault tolerance will be locked;</p>
+<p>Fault-tolerant content: Master's fault-tolerant content includes:
fault-tolerant process instances and task instances. Before fault-tolerant,
compares the start time of the instance with the server start-up time, and
skips fault-tolerance if after the server start time;</p>
+<p>Fault-tolerant post-processing: After the fault tolerance of ZooKeeper
Master completed, then re-schedule by the Scheduler thread in DolphinScheduler,
traverses the DAG to find the "running" and "submit
successful" tasks. Monitor the status of its task instances for the
"running" tasks, and for the "commits successful" tasks, it
is necessary to find out whether the task queue already exists. If exists,
monitor the status of the task instance. Ot [...]
<ul>
<li>Worker fault tolerance:</li>
</ul>
@@ -191,44 +183,41 @@ Among them, the Master monitors the directories of other
Masters and Workers. If
<img src="/img/failover-worker.jpg" alt="failover-worker" width="50%" />
</p>
<p>Fault tolerance range: From the perspective of process instance, each
Master is only responsible for fault tolerance of its own process instance; it
will lock only when <code>handleDeadServer</code>;</p>
-<p>Fault-tolerant content: When sending the remove event of the Worker node,
the Master only fault-tolerant task instances. Before fault-tolerant, it
compares the start time of the instance with the server start-up time, and
skips fault-tolerance if after the server start time;</p>
+<p>Fault-tolerant content: When sending the remove event of the Worker node,
the Master only fault-tolerant task instances. Before fault-tolerant, compares
the start time of the instance with the server start-up time, and skips
fault-tolerance if after the server start time;</p>
<p>Fault-tolerant post-processing: Once the Master Scheduler thread finds that
the task instance is in the "fault-tolerant" state, it takes over the
task and resubmits it.</p>
-<p>Note: Due to "network jitter", the node may lose its heartbeat
with ZooKeeper in a short period of time, and the node's remove event may
occur. For this situation, we use the simplest way, that is, once the node and
ZooKeeper timeout connection occurs, then directly stop the Master or Worker
service.</p>
+<p>Note: Due to "network jitter", the node may lose heartbeat with
ZooKeeper in a short period of time, and the node's remove event may occur. For
this situation, we use the simplest way, that is, once the node and ZooKeeper
timeout connection occurs, then directly stop the Master or Worker service.</p>
<h5>Task Failed and Try Again</h5>
-<p>Here we must first distinguish the concepts of task failure retry, process
failure recovery, and process failure rerun:</p>
+<p>Here we must first distinguish the concepts of task failure retry, process
failure recovery, and process failure re-run:</p>
<ul>
-<li>Task failure retry is at the task level and is automatically performed by
the scheduling system. For example, if a Shell task is set to retry for 3
times, it will try to run it again up to 3 times after the Shell task
fails.</li>
-<li>Process failure recovery is at the process level and is performed
manually. Recovery can only be performed <strong>from the failed node</strong>
or <strong>from the current node</strong></li>
-<li>Process failure rerun is also at the process level and is performed
manually, rerun is performed from the start node</li>
+<li>Task failure retry is at the task level and is automatically performed by
the schedule system. For example, if a Shell task sets to retry for 3 times, it
will try to run it again up to 3 times after the Shell task fails.</li>
+<li>Process failure recovery is at the process level and is performed
manually. Recovery can only perform <strong>from the failed node</strong> or
<strong>from the current node</strong>.</li>
+<li>Process failure re-run is also at the process level and is performed
manually, re-run perform from the beginning node.</li>
</ul>
-<p>Next to the topic, we divide the task nodes in the workflow into two
types.</p>
+<p>Next to the main point, we divide the task nodes in the workflow into two
types.</p>
<ul>
<li>
-<p>One is a business node, which corresponds to an actual script or processing
statement, such as Shell node, MR node, Spark node, and dependent node.</p>
+<p>One is a business node, which corresponds to an actual script or process
command, such as shell node, MR node, Spark node, and dependent node.</p>
</li>
<li>
-<p>There is also a logical node, which does not do actual script or statement
processing, but only logical processing of the entire process flow, such as
sub-process sections.</p>
+<p>Another is a logical node, which does not operate actual script or process
command, but only logical processing to the entire process flow, such as
sub-process sections.</p>
</li>
</ul>
-<p>Each <strong>business node</strong> can be configured with the number of
failed retries. When the task node fails, it will automatically retry until it
succeeds or exceeds the configured number of retries. <strong>Logical
node</strong> Failure retry is not supported. But the tasks in the logical node
support retry.</p>
-<p>If there is a task failure in the workflow that reaches the maximum number
of retries, the workflow will fail to stop, and the failed workflow can be
manually rerun or process recovery operation</p>
+<p>Each <strong>business node</strong> can configure the number of failed
retries. When the task node fails, it will automatically retry until it
succeeds or exceeds the retry times. <strong>Logical node</strong> failure
retry is not supported, but the tasks in the logical node support.</p>
+<p>If there is a task failure in the workflow that reaches the maximum retry
times, the workflow will fail and stop, and the failed workflow can be manually
re-run or process recovery operations.</p>
<h4>Task Priority Design</h4>
-<p>In the early scheduling design, if there is no priority design and the fair
scheduling design is used, the task submitted first may be completed at the
same time as the task submitted later, and the process or task priority cannot
be set, so We have redesigned this, and our current design is as follows:</p>
+<p>In the early schedule design, if there is no priority design and use the
fair scheduling, the task submitted first may complete at the same time with
the task submitted later, thus invalid the priority of process or task. So we
have re-designed this, and our current design is as follows:</p>
<ul>
-<li>According to <strong>priority of different process instances</strong>
priority over <strong>priority of the same process instance</strong> priority
over <strong>priority of tasks within the same process</strong>priority over
<strong>tasks within the same process</strong>submission order from high to Low
task processing.
+<li>According to <strong>the priority of different process instances</strong>
prior over <strong>priority of the same process instance</strong> prior over
<strong>priority of tasks within the same process</strong> prior over
<strong>tasks within the same process</strong>, process task submission order
from highest to Lowest.
<ul>
<li>
-<p>The specific implementation is to parse the priority according to the JSON
of the task instance, and then save the <strong>process instance
priority_process instance id_task priority_task id</strong> information in the
ZooKeeper task queue, when obtained from the task queue, pass String comparison
can get the tasks that need to be executed first</p>
+<p>The specific implementation is to parse the priority according to the JSON
of the task instance, and then save the <strong>process instance
priority_process instance id_task priority_task id</strong> information to the
ZooKeeper task queue. When obtain from the task queue, we can get the highest
priority task by comparing string.</p>
+<pre><code>- The priority of the process definition is to consider that some
processes need to process before other processes. Configure the priority when
the process starts or schedules. There are 5 levels in total, which are
HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
+ <p align="center">
+ <img
src="https://user-images.githubusercontent.com/10797147/146744784-eb351b14-c94a-4ed6-8ba4-5132c2a3d116.png"
alt="Process priority configuration" width="40%" />
+ </p>
+</code></pre>
<ul>
-<li>
-<p>The priority of the process definition is to consider that some processes
need to be processed before other processes. This can be configured when the
process is started or scheduled to start. There are 5 levels in total, which
are HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below</p>
- <p align="center">
- <img
src="https://user-images.githubusercontent.com/10797147/146744784-eb351b14-c94a-4ed6-8ba4-5132c2a3d116.png"
alt="Process priority configuration" width="40%" />
- </p>
-</li>
-<li>
-<p>The priority of the task is also divided into 5 levels, followed by
HIGHEST, HIGH, MEDIUM, LOW, LOWEST. As shown below</p>
- <p align="center">
+<li>The priority of the task is also divides into 5 levels, ordered by
HIGHEST, HIGH, MEDIUM, LOW, LOWEST. As shown below: <p align="center">
<img
src="https://user-images.githubusercontent.com/10797147/146744830-5eac611f-5933-4f53-a0c6-31613c283708.png"
alt="Task priority configuration" width="35%" />
</p>
</li>
@@ -240,23 +229,23 @@ Among them, the Master monitors the directories of other
Masters and Workers. If
<h4>Logback and Netty Implement Log Access</h4>
<ul>
<li>
-<p>Since Web (UI) and Worker are not necessarily on the same machine, viewing
the log cannot be like querying a local file. There are two options:</p>
+<p>Since Web (UI) and Worker are not always on the same machine, to view the
log cannot be like querying a local file. There are two options:</p>
</li>
<li>
-<p>Put logs on the ES search engine</p>
+<p>Put logs on the ES search engine.</p>
</li>
<li>
-<p>Obtain remote log information through netty communication</p>
+<p>Obtain remote log information through netty communication.</p>
</li>
<li>
-<p>In consideration of the lightness of DolphinScheduler as much as possible,
so I chose gRPC to achieve remote access to log information.</p>
+<p>In consideration of the lightness of DolphinScheduler as much as possible,
so choose gRPC to achieve remote access to log information.</p>
</li>
</ul>
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/grpc.png"
alt="grpc remote access" width="50%" />
</p>
<ul>
-<li>We use the FileAppender and Filter functions of the custom Logback to
realize that each task instance generates a log file.</li>
+<li>We use the customized FileAppender and Filter functions from Logback to
implement each task instance generates one log file.</li>
<li>FileAppender is mainly implemented as follows:</li>
</ul>
<pre><code class="language-java"> <span class="hljs-comment">/**
@@ -283,7 +272,7 @@ Among them, the Master monitors the directories of other
Masters and Workers. If
}
}
</code></pre>
-<p>Generate logs in the form of /process definition id/process instance
id/task instance id.log</p>
+<p>Generate logs in the form of /process definition id /process instance id
/task instance id.log</p>
<ul>
<li>
<p>Filter to match the thread name starting with TaskLogInfo:</p>
@@ -309,32 +298,32 @@ Among them, the Master monitors the directories of other
Masters and Workers. If
<h2>Module Introduction</h2>
<ul>
<li>
-<p>dolphinscheduler-alert alarm module, providing AlertServer service.</p>
+<p>dolphinscheduler-alert: alarm module, providing AlertServer service.</p>
</li>
<li>
-<p>dolphinscheduler-api web application module, providing ApiServer
service.</p>
+<p>dolphinscheduler-api: web application module, providing ApiServer
service.</p>
</li>
<li>
-<p>dolphinscheduler-common General constant enumeration, utility class, data
structure or base class</p>
+<p>dolphinscheduler-common: contains general constant enumeration, utility
class, data structure and base class.</p>
</li>
<li>
-<p>dolphinscheduler-dao provides operations such as database access.</p>
+<p>dolphinscheduler-dao: provides operations such as database access.</p>
</li>
<li>
-<p>dolphinscheduler-remote client and server based on netty</p>
+<p>dolphinscheduler-remote: client and server based on netty.</p>
</li>
<li>
-<p>dolphinscheduler-server MasterServer and WorkerServer services</p>
+<p>dolphinscheduler-server: MasterServer and WorkerServer services.</p>
</li>
<li>
-<p>dolphinscheduler-service service module, including Quartz, ZooKeeper, log
client access service, easy to call server module and api module</p>
+<p>dolphinscheduler-service: service module, including Quartz, ZooKeeper, log
client access service, convenient for calling from server module and API
module.</p>
</li>
<li>
-<p>dolphinscheduler-ui front-end module</p>
+<p>dolphinscheduler-ui: front-end module.</p>
</li>
</ul>
<h2>Sum Up</h2>
-<p>From the perspective of scheduling, this article preliminarily introduces
the architecture principles and implementation ideas of the big data
distributed workflow scheduling system-DolphinScheduler. To be continued</p>
+<p>From the perspective of scheduling, this article preliminarily introduces
the architecture principles and implementation ideas of the big data
distributed workflow scheduling system: DolphinScheduler. To be continued.</p>
</div></section><footer class="footer-container"><div
class="footer-body"><div><h3>About us</h3><h4>Do you need feedback? Please
contact us through the following ways.</h4></div><div
class="contact-container"><ul><li><a
href="/en-us/community/development/subscribe.html"><img class="img-base"
src="/img/emailgray.png"/><img class="img-change"
src="/img/emailblue.png"/><p>Email List</p></a></li><li><a
href="https://twitter.com/dolphinschedule"><img class="img-base"
src="/img/twittergray.png [...]
<script
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-with-addons.min.js"></script>
<script
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-dom.min.js"></script>
diff --git a/en-us/docs/dev/user_doc/architecture/design.json
b/en-us/docs/dev/user_doc/architecture/design.json
index bb3c884..f611e6f 100644
--- a/en-us/docs/dev/user_doc/architecture/design.json
+++ b/en-us/docs/dev/user_doc/architecture/design.json
@@ -1,6 +1,6 @@
{
"filename": "design.md",
- "__html": "<h1>System Architecture Design</h1>\n<p>Before explaining the
architecture of the scheduling system, let's first understand the commonly used
terms of the scheduling system</p>\n<h2>Glossary</h2>\n<p><strong>DAG:</strong>
The full name is Directed Acyclic Graph, referred to as DAG. Task tasks in the
workflow are assembled in the form of a directed acyclic graph, and topological
traversal is performed from nodes with zero degrees of entry until there are no
subsequent nodes. [...]
+ "__html": "<h1>System Architecture Design</h1>\n<p>Before explain the
architecture of the scheduling system, let's first get to know the terms
commonly used in scheduling
system.</p>\n<h2>Glossary</h2>\n<p><strong>DAG:</strong> The full name is
Directed Acyclic Graph, the abbreviation is DAG. Tasks in the workflow are
assembled in the form of a directed acyclic graph, and topological traversal
performs from zero degree entry nodes until there are no subsequent nodes.
Examples are as fo [...]
"link": "/dist/en-us/docs/dev/user_doc/architecture/design.html",
"meta": {}
}
\ No newline at end of file
diff --git a/en-us/docs/dev/user_doc/architecture/load-balance.html
b/en-us/docs/dev/user_doc/architecture/load-balance.html
index 3d4d20e..4f7f6a3 100644
--- a/en-us/docs/dev/user_doc/architecture/load-balance.html
+++ b/en-us/docs/dev/user_doc/architecture/load-balance.html
@@ -14,32 +14,40 @@
<p>Load balancing refers to the reasonable allocation of server pressure
through routing algorithms (usually in cluster environments) to achieve the
maximum optimization of server performance.</p>
<h2>DolphinScheduler-Worker Load Balancing Algorithms</h2>
<p>DolphinScheduler-Master allocates tasks to workers, and by default provides
three algorithms:</p>
+<ul>
+<li>
<p>Weighted random (random)</p>
-<p>Smoothing polling (roundrobin)</p>
-<p>Linear load (lowerweight)</p>
+</li>
+<li>
+<p>Smoothing polling (round-robin)</p>
+</li>
+<li>
+<p>Linear load (lower weight)</p>
+</li>
+</ul>
<p>The default configuration is the linear load.</p>
-<p>As the routing is done on the client side, the master service, you can
change master.host.selector in master.properties to configure the algorithm
what you want.</p>
-<p>e.g. master.host.selector = random (case-insensitive)</p>
+<p>As the routing sets on the client side, the master service, you can change
master.host.selector in master.properties to configure the algorithm.</p>
+<p>e.g. master.host.selector=random (case-insensitive)</p>
<h2>Worker Load Balancing Configuration</h2>
<p>The configuration file is worker.properties</p>
<h3>Weight</h3>
-<p>All of the above load algorithms are weighted based on weights, which
affect the outcome of the triage. You can set different weights for different
machines by modifying the worker.weight value.</p>
+<p>All the load algorithms above are weighted based on weights, which affect
the routing outcome. You can set different weights for different machines by
modifying the <code>worker.weight</code> value.</p>
<h3>Preheating</h3>
-<p>With JIT optimisation in mind, we will let the worker run at low power for
a period of time after startup so that it can gradually reach its optimal
state, a process we call preheating. If you are interested, you can read some
articles about JIT.</p>
-<p>So the worker will gradually reach its maximum weight over time after it
starts (by default ten minutes, we don't provide a configuration item, you can
change it and submit a PR if needed).</p>
-<h2>Load Balancing Algorithm Breakdown</h2>
+<p>Consider JIT optimization, worker runs at low power for a period of time
after startup, so that it can gradually reach its optimal state, a process we
call preheating. If you are interested, you can read some articles about
JIT.</p>
+<p>So the worker gradually reaches its maximum weight with time after starts
up ( by default ten minutes, there is no configuration about the pre-heating
duration, it's recommend to submit a PR if have needs to change the
duration).</p>
+<h2>Load Balancing Algorithm in Details</h2>
<h3>Random (Weighted)</h3>
-<p>This algorithm is relatively simple, one of the matched workers is selected
at random (the weighting affects his weighting).</p>
+<p>This algorithm is relatively simple, select a worker by random (the weight
affects its weighting).</p>
<h3>Smoothed Polling (Weighted)</h3>
-<p>An obvious drawback of the weighted polling algorithm. Namely, under
certain specific weights, weighted polling scheduling generates an uneven
sequence of instances, and this unsmoothed load may cause some instances to
experience transient high loads, leading to a risk of system downtime. To
address this scheduling flaw, we provide a smooth weighted polling
algorithm.</p>
-<p>Each worker is given two weights, weight (which remains constant after
warm-up is complete) and current_weight (which changes dynamically), for each
route. The current_weight + weight is iterated over all the workers, and the
weight of all the workers is added up and counted as total_weight, then the
worker with the largest current_weight is selected as the worker for this task.
current_weight-total_weight.</p>
+<p>An obvious drawback of the weighted polling algorithm, which is under
special weights circumstance, weighted polling scheduling generates an
imbalanced sequence of instances, and this unsmooth load may cause some
instances to experience transient high loads, leading to a risk of system
crash. To address this scheduling flaw, we provide a smooth weighted polling
algorithm.</p>
+<p>Each worker has two weights parameters, weight (which remains constant
after warm-up is complete) and current_weight (which changes dynamically). For
every route, calculate the current_weight + weight and is iterated over all the
workers, the weight of all the workers sum up and count as total_weight, then
the worker with the largest current_weight is selected as the worker for this
task. By meantime, set worker's current_weight-total_weight.</p>
<h3>Linear Weighting (Default Algorithm)</h3>
-<p>The algorithm reports its own load information to the registry at regular
intervals. We base our judgement on two main pieces of information</p>
+<p>This algorithm reports its own load information to the registry at regular
intervals. Make decision on two main pieces of information:</p>
<ul>
<li>load average (default is the number of CPU cores * 2)</li>
<li>available physical memory (default is 0.3, in G)</li>
</ul>
-<p>If either of the two is lower than the configured item, then this worker
will not participate in the load. (no traffic will be allocated)</p>
+<p>If either of these is lower than the configured item, then this worker will
not participate in the load. (no traffic will be allocated)</p>
<p>You can customise the configuration by changing the following properties in
worker.properties</p>
<ul>
<li>worker.max.cpuload.avg=-1 (worker max cpuload avg, only higher than the
system cpu load average, worker server can be dispatched tasks. default value
-1: the number of cpu cores * 2)</li>
diff --git a/en-us/docs/dev/user_doc/architecture/load-balance.json
b/en-us/docs/dev/user_doc/architecture/load-balance.json
index 3a6e983..ee6bc07 100644
--- a/en-us/docs/dev/user_doc/architecture/load-balance.json
+++ b/en-us/docs/dev/user_doc/architecture/load-balance.json
@@ -1,6 +1,6 @@
{
"filename": "load-balance.md",
- "__html": "<h1>Load Balance</h1>\n<p>Load balancing refers to the reasonable
allocation of server pressure through routing algorithms (usually in cluster
environments) to achieve the maximum optimization of server
performance.</p>\n<h2>DolphinScheduler-Worker Load Balancing
Algorithms</h2>\n<p>DolphinScheduler-Master allocates tasks to workers, and by
default provides three algorithms:</p>\n<p>Weighted random
(random)</p>\n<p>Smoothing polling (roundrobin)</p>\n<p>Linear load (lowerwei
[...]
+ "__html": "<h1>Load Balance</h1>\n<p>Load balancing refers to the reasonable
allocation of server pressure through routing algorithms (usually in cluster
environments) to achieve the maximum optimization of server
performance.</p>\n<h2>DolphinScheduler-Worker Load Balancing
Algorithms</h2>\n<p>DolphinScheduler-Master allocates tasks to workers, and by
default provides three algorithms:</p>\n<ul>\n<li>\n<p>Weighted random
(random)</p>\n</li>\n<li>\n<p>Smoothing polling (round-robin)</p> [...]
"link": "/dist/en-us/docs/dev/user_doc/architecture/load-balance.html",
"meta": {}
}
\ No newline at end of file
diff --git a/en-us/docs/dev/user_doc/architecture/metadata.html
b/en-us/docs/dev/user_doc/architecture/metadata.html
index 5480a61..0d7954f 100644
--- a/en-us/docs/dev/user_doc/architecture/metadata.html
+++ b/en-us/docs/dev/user_doc/architecture/metadata.html
@@ -11,7 +11,6 @@
</head>
<body>
<div id="root"><div class="md2html docs-page" data-reactroot=""><header
class="header-container header-container-dark"><div class="header-body"><span
class="mobile-menu-btn mobile-menu-btn-dark"></span><a
href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div
class="search search-dark"><span class="icon-search"></span></div><span
class="language-switch language-switch-dark">中</span><div
class="header-menu"><div><ul class="ant-menu whiteClass ant-menu-light ant-
[...]
-<p><a name="V5KOl"></a></p>
<h2>DolphinScheduler DB Table Overview</h2>
<table>
<thead>
@@ -23,7 +22,7 @@
<tbody>
<tr>
<td style="text-align:center">t_ds_access_token</td>
-<td style="text-align:center">token for access ds backend</td>
+<td style="text-align:center">token for access DolphinScheduler backend</td>
</tr>
<tr>
<td style="text-align:center">t_ds_alert</td>
@@ -47,7 +46,7 @@
</tr>
<tr>
<td style="text-align:center">t_ds_process_definition</td>
-<td style="text-align:center">process difinition</td>
+<td style="text-align:center">process definition</td>
</tr>
<tr>
<td style="text-align:center">t_ds_process_instance</td>
@@ -79,7 +78,7 @@
</tr>
<tr>
<td style="text-align:center">t_ds_relation_udfs_user</td>
-<td style="text-align:center">UDF related to user</td>
+<td style="text-align:center">UDF functions related to user</td>
</tr>
<tr>
<td style="text-align:center">t_ds_relation_user_alertgroup</td>
@@ -91,7 +90,7 @@
</tr>
<tr>
<td style="text-align:center">t_ds_schedules</td>
-<td style="text-align:center">process difinition schedule</td>
+<td style="text-align:center">process definition schedule</td>
</tr>
<tr>
<td style="text-align:center">t_ds_session</td>
@@ -115,43 +114,37 @@
</tr>
<tr>
<td style="text-align:center">t_ds_version</td>
-<td style="text-align:center">ds version</td>
+<td style="text-align:center">DolphinScheduler version</td>
</tr>
</tbody>
</table>
<hr>
-<p><a name="XCLy1"></a></p>
<h2>E-R Diagram</h2>
-<p><a name="5hWWZ"></a></p>
<h3>User Queue DataSource</h3>
<p><img src="/img/metadata-erd/user-queue-datasource.png" alt="image.png"></p>
<ul>
-<li>Multiple users can belong to one tenant</li>
-<li>The queue field in the t_ds_user table stores the queue_name information
in the t_ds_queue table, but t_ds_tenant stores queue information using
queue_id. During the execution of the process definition, the user queue has
the highest priority. If the user queue is empty, the tenant queue is used.</li>
-<li>The user_id field in the t_ds_datasource table indicates the user who
created the data source. The user_id in t_ds_relation_datasource_user indicates
the user who has permission to the data source.
-<a name="7euSN"></a></li>
+<li>One tenant can own Multiple users.</li>
+<li>The queue field in the t_ds_user table stores the queue_name information
in the t_ds_queue table, t_ds_tenant stores queue information using queue_id
column. During the execution of the process definition, the user queue has the
highest priority. If the user queue is null, use the tenant queue.</li>
+<li>The user_id field in the t_ds_datasource table shows the user who create
the data source. The user_id in t_ds_relation_datasource_user shows the user
who has permission to the data source.</li>
</ul>
<h3>Project Resource Alert</h3>
<p><img src="/img/metadata-erd/project-resource-alert.png" alt="image.png"></p>
<ul>
-<li>User can have multiple projects, User project authorization completes the
relationship binding using project_id and user_id in t_ds_relation_project_user
table</li>
-<li>The user_id in the t_ds_projcet table represents the user who created the
project, and the user_id in the t_ds_relation_project_user table represents
users who have permission to the project</li>
-<li>The user_id in the t_ds_resources table represents the user who created
the resource, and the user_id in t_ds_relation_resources_user represents the
user who has permissions to the resource</li>
-<li>The user_id in the t_ds_udfs table represents the user who created the
UDF, and the user_id in the t_ds_relation_udfs_user table represents a user who
has permission to the UDF
-<a name="JEw4v"></a></li>
+<li>User can have multiple projects, user project authorization completes the
relationship binding using project_id and user_id in t_ds_relation_project_user
table.</li>
+<li>The user_id in the t_ds_projcet table represents the user who create the
project, and the user_id in the t_ds_relation_project_user table represents
users who have permission to the project.</li>
+<li>The user_id in the t_ds_resources table represents the user who create the
resource, and the user_id in t_ds_relation_resources_user represents the user
who has permissions to the resource.</li>
+<li>The user_id in the t_ds_udfs table represents the user who create the UDF,
and the user_id in the t_ds_relation_udfs_user table represents a user who has
permission to the UDF.</li>
</ul>
<h3>Command Process Task</h3>
<p><img src="/img/metadata-erd/command.png" alt="image.png"><br /><img
src="/img/metadata-erd/process-task.png" alt="image.png"></p>
<ul>
-<li>A project has multiple process definitions, a process definition can
generate multiple process instances, and a process instance can generate
multiple task instances</li>
-<li>The t_ds_schedulers table stores the timing schedule information for
process difinition</li>
-<li>The data stored in the t_ds_relation_process_instance table is used to
deal with that the process definition contains sub-processes,
parent_process_instance_id field represents the id of the main process instance
containing the child process, process_instance_id field represents the id of
the sub-process instance, parent_task_instance_id field represents the task
instance id of the sub-process node</li>
+<li>A project has multiple process definitions, a process definition can
generate multiple process instances, and a process instance can generate
multiple task instances.</li>
+<li>The t_ds_schedulers table stores the specified time schedule information
for process definition.</li>
+<li>The data stored in the t_ds_relation_process_instance table is used to
deal with the sub-processes of a process definition, parent_process_instance_id
field represents the id of the main process instance who contains child
processes, process_instance_id field represents the id of the sub-process
instance, parent_task_instance_id field represents the task instance id of the
sub-process node.</li>
<li>The process instance table and the task instance table correspond to the
t_ds_process_instance table and the t_ds_task_instance table, respectively.</li>
</ul>
<hr>
-<p><a name="yd79T"></a></p>
<h2>Core Table Schema</h2>
-<p><a name="6bVhH"></a></p>
<h3>t_ds_process_definition</h3>
<table>
<thead>
@@ -195,12 +188,12 @@
<tr>
<td>process_definition_json</td>
<td>longtext</td>
-<td>process definition json content</td>
+<td>process definition JSON content</td>
</tr>
<tr>
<td>description</td>
<td>text</td>
-<td>process difinition desc</td>
+<td>process definition description</td>
</tr>
<tr>
<td>global_params</td>
@@ -210,7 +203,7 @@
<tr>
<td>flag</td>
<td>tinyint</td>
-<td>process is available: 0 not available, 1 available</td>
+<td>whether process available: 0 not available, 1 available</td>
</tr>
<tr>
<td>locations</td>
@@ -252,9 +245,18 @@
<td>datetime</td>
<td>update time</td>
</tr>
+<tr>
+<td>modify_by</td>
+<td>varchar</td>
+<td>define user modify the process</td>
+</tr>
+<tr>
+<td>resource_ids</td>
+<td>varchar</td>
+<td>resource id set</td>
+</tr>
</tbody>
</table>
-<p><a name="t5uxM"></a></p>
<h3>t_ds_process_instance</h3>
<table>
<thead>
@@ -283,12 +285,12 @@
<tr>
<td>state</td>
<td>tinyint</td>
-<td>process instance Status: 0 commit succeeded, 1 running, 2 prepare to
pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault
tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete</td>
+<td>process instance Status: 0 successful commit, 1 running, 2 prepare to
pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault
tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete</td>
</tr>
<tr>
<td>recovery</td>
<td>tinyint</td>
-<td>process instance failover flag:0:normal,1:failover instance</td>
+<td>process instance failover flag:0: normal,1: failover instance needs
restart</td>
</tr>
<tr>
<td>start_time</td>
@@ -313,17 +315,17 @@
<tr>
<td>command_type</td>
<td>tinyint</td>
-<td>command type:0 start ,1 Start from the current node,2 Resume a
fault-tolerant process,3 Resume Pause Process, 4 Execute from the failed node,5
Complement, 6 dispatch, 7 re-run, 8 pause, 9 stop ,10 Resume waiting thread</td>
+<td>command type:0 start ,1 start from the current node,2 resume a
fault-tolerant process,3 resume from pause process, 4 execute from the failed
node,5 complement, 6 dispatch, 7 re-run, 8 pause, 9 stop, 10 resume waiting
thread</td>
</tr>
<tr>
<td>command_param</td>
<td>text</td>
-<td>json command parameters</td>
+<td>JSON command parameters</td>
</tr>
<tr>
<td>task_depend_type</td>
<td>tinyint</td>
-<td>task depend type. 0: only current node,1:before the node,2:later nodes</td>
+<td>node dependency type: 0 current node, 1 forward, 2 backward</td>
</tr>
<tr>
<td>max_try_times</td>
@@ -333,12 +335,12 @@
<tr>
<td>failure_strategy</td>
<td>tinyint</td>
-<td>failure strategy. 0:end the process when node failed,1:continue running
the other nodes when node failed</td>
+<td>failure strategy, 0: end the process when node failed,1: continue run the
other nodes when failed</td>
</tr>
<tr>
<td>warning_type</td>
<td>tinyint</td>
-<td>warning type. 0:no warning,1:warning if process success,2:warning if
process failed,3:warning if success</td>
+<td>warning type 0: no warning, 1: warning if process success, 2: warning if
process failed, 3: warning whatever results</td>
</tr>
<tr>
<td>warning_group_id</td>
@@ -363,12 +365,12 @@
<tr>
<td>process_instance_json</td>
<td>longtext</td>
-<td>process instance json</td>
+<td>process instance JSON</td>
</tr>
<tr>
<td>flag</td>
<td>tinyint</td>
-<td>process instance is available: 0 not available, 1 available</td>
+<td>whether process instance is available: 0 not available, 1 available</td>
</tr>
<tr>
<td>update_time</td>
@@ -388,37 +390,37 @@
<tr>
<td>locations</td>
<td>text</td>
-<td>Node location information</td>
+<td>node location information</td>
</tr>
<tr>
<td>connects</td>
<td>text</td>
-<td>Node connection information</td>
+<td>node connection information</td>
</tr>
<tr>
<td>history_cmd</td>
<td>text</td>
-<td>history commands of process instance operation</td>
+<td>history commands, record all the commands to a instance</td>
</tr>
<tr>
<td>dependence_schedule_times</td>
<td>text</td>
-<td>depend schedule fire time</td>
+<td>depend schedule estimate time</td>
</tr>
<tr>
<td>process_instance_priority</td>
<td>int</td>
-<td>process instance priority. 0 Highest,1 High,2 Medium,3 Low,4 Lowest</td>
+<td>process instance priority. 0 highest,1 high,2 medium,3 low,4 lowest</td>
</tr>
<tr>
-<td>worker_group_id</td>
-<td>int</td>
-<td>worker group id</td>
+<td>worker_group</td>
+<td>varchar</td>
+<td>worker group who assign the task</td>
</tr>
<tr>
<td>timeout</td>
<td>int</td>
-<td>time out</td>
+<td>timeout</td>
</tr>
<tr>
<td>tenant_id</td>
@@ -427,7 +429,6 @@
</tr>
</tbody>
</table>
-<p><a name="tHZsY"></a></p>
<h3>t_ds_task_instance</h3>
<table>
<thead>
@@ -466,7 +467,7 @@
<tr>
<td>task_json</td>
<td>longtext</td>
-<td>task content json</td>
+<td>task content JSON</td>
</tr>
<tr>
<td>state</td>
@@ -521,12 +522,12 @@
<tr>
<td>app_link</td>
<td>varchar</td>
-<td>yarn app id</td>
+<td>Yarn app id</td>
</tr>
<tr>
<td>flag</td>
<td>tinyint</td>
-<td>taskinstance is available: 0 not available, 1 available</td>
+<td>task instance is available : 0 not available, 1 available</td>
</tr>
<tr>
<td>retry_interval</td>
@@ -541,16 +542,97 @@
<tr>
<td>task_instance_priority</td>
<td>int</td>
-<td>task instance priority:0 Highest,1 High,2 Medium,3 Low,4 Lowest</td>
+<td>task instance priority:0 highest,1 high,2 medium,3 low,4 lowest</td>
</tr>
<tr>
-<td>worker_group_id</td>
+<td>worker_group</td>
+<td>varchar</td>
+<td>worker group who assign the task</td>
+</tr>
+</tbody>
+</table>
+<h4>t_ds_schedules</h4>
+<table>
+<thead>
+<tr>
+<th>Field</th>
+<th>Type</th>
+<th>Comment</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>id</td>
<td>int</td>
-<td>worker group id</td>
+<td>primary key</td>
+</tr>
+<tr>
+<td>process_definition_id</td>
+<td>int</td>
+<td>process definition id</td>
+</tr>
+<tr>
+<td>start_time</td>
+<td>datetime</td>
+<td>schedule start time</td>
+</tr>
+<tr>
+<td>end_time</td>
+<td>datetime</td>
+<td>schedule end time</td>
+</tr>
+<tr>
+<td>crontab</td>
+<td>varchar</td>
+<td>crontab expression</td>
+</tr>
+<tr>
+<td>failure_strategy</td>
+<td>tinyint</td>
+<td>failure strategy: 0 end,1 continue</td>
+</tr>
+<tr>
+<td>user_id</td>
+<td>int</td>
+<td>user id</td>
+</tr>
+<tr>
+<td>release_state</td>
+<td>tinyint</td>
+<td>release status: 0 not yet released,1 released</td>
+</tr>
+<tr>
+<td>warning_type</td>
+<td>tinyint</td>
+<td>warning type: 0: no warning, 1: warning if process success, 2: warning if
process failed, 3: warning whatever results</td>
+</tr>
+<tr>
+<td>warning_group_id</td>
+<td>int</td>
+<td>warning group id</td>
+</tr>
+<tr>
+<td>process_instance_priority</td>
+<td>int</td>
+<td>process instance priority:0 highest,1 high,2 medium,3 low,4 lowest</td>
+</tr>
+<tr>
+<td>worker_group</td>
+<td>varchar</td>
+<td>worker group who assign the task</td>
+</tr>
+<tr>
+<td>create_time</td>
+<td>datetime</td>
+<td>create time</td>
+</tr>
+<tr>
+<td>update_time</td>
+<td>datetime</td>
+<td>update time</td>
</tr>
</tbody>
</table>
-<p><a name="gLGtm"></a></p>
<h3>t_ds_command</h3>
<table>
<thead>
@@ -569,7 +651,7 @@
<tr>
<td>command_type</td>
<td>tinyint</td>
-<td>Command type: 0 start workflow, 1 start execution from current node, 2
resume fault-tolerant workflow, 3 resume pause process, 4 start execution from
failed node, 5 complement, 6 schedule, 7 rerun, 8 pause, 9 stop, 10 resume
waiting thread</td>
+<td>command type: 0 start workflow, 1 start execution from current node, 2
resume fault-tolerant workflow, 3 resume pause process, 4 start execution from
failed node, 5 complement, 6 schedule, 7 re-run, 8 pause, 9 stop, 10 resume
waiting thread</td>
</tr>
<tr>
<td>process_definition_id</td>
@@ -579,27 +661,27 @@
<tr>
<td>command_param</td>
<td>text</td>
-<td>json command parameters</td>
+<td>JSON command parameters</td>
</tr>
<tr>
<td>task_depend_type</td>
<td>tinyint</td>
-<td>Node dependency type: 0 current node, 1 forward, 2 backward</td>
+<td>node dependency type: 0 current node, 1 forward, 2 backward</td>
</tr>
<tr>
<td>failure_strategy</td>
<td>tinyint</td>
-<td>Failed policy: 0 end, 1 continue</td>
+<td>failed policy: 0 end, 1 continue</td>
</tr>
<tr>
<td>warning_type</td>
<td>tinyint</td>
-<td>Alarm type: 0 is not sent, 1 process is sent successfully, 2 process is
sent failed, 3 process is sent successfully and all failures are sent</td>
+<td>alarm type: 0 no alarm, 1 alarm if process success, 2: alarm if process
failed, 3: warning whatever results</td>
</tr>
<tr>
<td>warning_group_id</td>
<td>int</td>
-<td>warning group</td>
+<td>warning group id</td>
</tr>
<tr>
<td>schedule_time</td>
@@ -619,7 +701,7 @@
<tr>
<td>dependence</td>
<td>varchar</td>
-<td>dependence</td>
+<td>dependence column</td>
</tr>
<tr>
<td>update_time</td>
@@ -629,12 +711,12 @@
<tr>
<td>process_instance_priority</td>
<td>int</td>
-<td>process instance priority: 0 Highest,1 High,2 Medium,3 Low,4 Lowest</td>
+<td>process instance priority: 0 highest,1 high,2 medium,3 low,4 lowest</td>
</tr>
<tr>
<td>worker_group_id</td>
<td>int</td>
-<td>worker group id</td>
+<td>worker group who assign the task</td>
</tr>
</tbody>
</table>
diff --git a/en-us/docs/dev/user_doc/architecture/metadata.json
b/en-us/docs/dev/user_doc/architecture/metadata.json
index 403b4d4..c61ffa7 100644
--- a/en-us/docs/dev/user_doc/architecture/metadata.json
+++ b/en-us/docs/dev/user_doc/architecture/metadata.json
@@ -1,6 +1,6 @@
{
"filename": "metadata.md",
- "__html": "<h1>MetaData</h1>\n<p><a
name=\"V5KOl\"></a></p>\n<h2>DolphinScheduler DB Table
Overview</h2>\n<table>\n<thead>\n<tr>\n<th style=\"text-align:center\">Table
Name</th>\n<th
style=\"text-align:center\">Comment</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td
style=\"text-align:center\">t_ds_access_token</td>\n<td
style=\"text-align:center\">token for access ds backend</td>\n</tr>\n<tr>\n<td
style=\"text-align:center\">t_ds_alert</td>\n<td
style=\"text-align:center\">alert detail</td> [...]
+ "__html": "<h1>MetaData</h1>\n<h2>DolphinScheduler DB Table
Overview</h2>\n<table>\n<thead>\n<tr>\n<th style=\"text-align:center\">Table
Name</th>\n<th
style=\"text-align:center\">Comment</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td
style=\"text-align:center\">t_ds_access_token</td>\n<td
style=\"text-align:center\">token for access DolphinScheduler
backend</td>\n</tr>\n<tr>\n<td style=\"text-align:center\">t_ds_alert</td>\n<td
style=\"text-align:center\">alert detail</td>\n</tr>\n<tr>\n<t [...]
"link": "/dist/en-us/docs/dev/user_doc/architecture/metadata.html",
"meta": {}
}
\ No newline at end of file
diff --git a/en-us/docs/dev/user_doc/architecture/task-structure.html
b/en-us/docs/dev/user_doc/architecture/task-structure.html
index 09d1f06..b923593 100644
--- a/en-us/docs/dev/user_doc/architecture/task-structure.html
+++ b/en-us/docs/dev/user_doc/architecture/task-structure.html
@@ -12,8 +12,8 @@
<body>
<div id="root"><div class="md2html docs-page" data-reactroot=""><header
class="header-container header-container-dark"><div class="header-body"><span
class="mobile-menu-btn mobile-menu-btn-dark"></span><a
href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div
class="search search-dark"><span class="icon-search"></span></div><span
class="language-switch language-switch-dark">中</span><div
class="header-menu"><div><ul class="ant-menu whiteClass ant-menu-light ant-
[...]
<h2>Overall Tasks Storage Structure</h2>
-<p>All tasks created in DolphinScheduler are saved in the
t_ds_process_definition table.</p>
-<p>The following shows the 't_ds_process_definition' table structure:</p>
+<p>All tasks in DolphinScheduler are saved in the
<code>t_ds_process_definition</code> table.</p>
+<p>The following shows the <code>t_ds_process_definition</code> table
structure:</p>
<table>
<thead>
<tr>
@@ -46,7 +46,7 @@
<td>4</td>
<td>release_state</td>
<td>tinyint(4)</td>
-<td>release status of process definition: 0 not online, 1 online</td>
+<td>release status of process definition: 0 not released, 1 released</td>
</tr>
<tr>
<td>5</td>
@@ -136,7 +136,7 @@
<td>19</td>
<td>modify_by</td>
<td>varchar(36)</td>
-<td>specifics of the user that made the modification</td>
+<td>specify the user that made the modification</td>
</tr>
<tr>
<td>20</td>
@@ -146,7 +146,7 @@
</tr>
</tbody>
</table>
-<p>The 'process_definition_json' field is the core field, which defines the
task information in the DAG diagram, and it is stored in JSON format.</p>
+<p>The <code>process_definition_json</code> field is the core field, which
defines the task information in the DAG diagram, and it is stored in JSON
format.</p>
<p>The following table describes the common data structure.</p>
<table>
<thead>
@@ -244,7 +244,7 @@
<td></td>
<td>Object</td>
<td>customized parameters</td>
-<td>Json format</td>
+<td>JSON format</td>
</tr>
<tr>
<td>5</td>
@@ -458,7 +458,7 @@
<td></td>
<td>Object</td>
<td>customized parameters</td>
-<td>Json format</td>
+<td>JSON format</td>
</tr>
<tr>
<td>5</td>
@@ -530,7 +530,7 @@
<td>showType</td>
<td>String</td>
<td>display type of mail</td>
-<td>optionals: TABLE or ATTACHMENT</td>
+<td>options: TABLE or ATTACHMENT</td>
</tr>
<tr>
<td>14</td>
@@ -553,7 +553,7 @@
<td></td>
<td>postStatements</td>
<td>Array</td>
-<td>postposition SQL statements</td>
+<td>post-position SQL statements</td>
<td></td>
</tr>
<tr>
@@ -767,7 +767,7 @@
<td></td>
<td>Object</td>
<td>customized parameters</td>
-<td>Json format</td>
+<td>JSON format</td>
</tr>
<tr>
<td>5</td>
@@ -1081,7 +1081,7 @@
<td></td>
<td>Object</td>
<td>customized parameters</td>
-<td>Json format</td>
+<td>JSON format</td>
</tr>
<tr>
<td>5</td>
@@ -1332,7 +1332,7 @@
<td></td>
<td>Object</td>
<td>customized parameters</td>
-<td>Json format</td>
+<td>JSON format</td>
</tr>
<tr>
<td>5</td>
@@ -1545,7 +1545,7 @@
<td></td>
<td>Object</td>
<td>customized parameters</td>
-<td>Json format</td>
+<td>JSON format</td>
</tr>
<tr>
<td>5</td>
@@ -1842,7 +1842,7 @@
<td></td>
<td>Object</td>
<td>customized parameters</td>
-<td>Json format</td>
+<td>JSON format</td>
</tr>
<tr>
<td>5</td>
@@ -2087,7 +2087,7 @@
<td></td>
<td>Object</td>
<td>customized parameters</td>
-<td>Json format</td>
+<td>JSON format</td>
</tr>
<tr>
<td>5</td>
@@ -2174,7 +2174,7 @@
<td></td>
<td>postStatements</td>
<td>Array</td>
-<td>postposition SQL</td>
+<td>post-position SQL</td>
<td></td>
</tr>
<tr>
@@ -2384,7 +2384,7 @@
<td></td>
<td>Object</td>
<td>customized parameters</td>
-<td>Json format</td>
+<td>JSON format</td>
</tr>
<tr>
<td>5</td>
@@ -2810,7 +2810,7 @@
<td></td>
<td>Object</td>
<td>customized parameters</td>
-<td>Json format</td>
+<td>JSON format</td>
</tr>
<tr>
<td>5</td>
@@ -2995,7 +2995,7 @@
<td></td>
<td>Object</td>
<td>customized parameters</td>
-<td>Json format</td>
+<td>JSON format</td>
</tr>
<tr>
<td>5</td>
diff --git a/en-us/docs/dev/user_doc/architecture/task-structure.json
b/en-us/docs/dev/user_doc/architecture/task-structure.json
index dcd9b87..a037a00 100644
--- a/en-us/docs/dev/user_doc/architecture/task-structure.json
+++ b/en-us/docs/dev/user_doc/architecture/task-structure.json
@@ -1,6 +1,6 @@
{
"filename": "task-structure.md",
- "__html": "<h1>Task Structure</h1>\n<h2>Overall Tasks Storage
Structure</h2>\n<p>All tasks created in DolphinScheduler are saved in the
t_ds_process_definition table.</p>\n<p>The following shows the
't_ds_process_definition' table
structure:</p>\n<table>\n<thead>\n<tr>\n<th>No.</th>\n<th>field</th>\n<th>type</th>\n<th>description</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>1</td>\n<td>id</td>\n<td>int(11)</td>\n<td>primary
key</td>\n</tr>\n<tr>\n<td>2</td>\n<td>name</td>\n<td>varchar(255 [...]
+ "__html": "<h1>Task Structure</h1>\n<h2>Overall Tasks Storage
Structure</h2>\n<p>All tasks in DolphinScheduler are saved in the
<code>t_ds_process_definition</code> table.</p>\n<p>The following shows the
<code>t_ds_process_definition</code> table
structure:</p>\n<table>\n<thead>\n<tr>\n<th>No.</th>\n<th>field</th>\n<th>type</th>\n<th>description</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>1</td>\n<td>id</td>\n<td>int(11)</td>\n<td>primary
key</td>\n</tr>\n<tr>\n<td>2</td>\n<td>name</td>\ [...]
"link": "/dist/en-us/docs/dev/user_doc/architecture/task-structure.html",
"meta": {}
}
\ No newline at end of file