[incubator-dolphinscheduler-website] branch master updated: add en doc of 1.2.1

lgcareer Wed, 26 Feb 2020 00:09:53 -0800

This is an automated email from the ASF dual-hosted git repository.

lgcareer pushed a commit to branch master
in repository 
https://gitbox.apache.org/repos/asf/incubator-dolphinscheduler-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 929f198  add en doc of 1.2.1
     new e54c58e  Merge pull request #92 from lgcareer/master
929f198 is described below

commit 929f1988813318f3798aa173d7a195cb7f10537d
Author: lgcareer <[email protected]>
AuthorDate: Wed Feb 26 16:07:43 2020 +0800

    add en doc of 1.2.1
---
 docs/en-us/1.2.1/user_doc/architecture-design.md | 316 ++++++++++
 docs/en-us/1.2.1/user_doc/metadata-1.2.md        | 174 ++++++
 docs/en-us/1.2.1/user_doc/plugin-development.md  |  54 ++
 docs/en-us/1.2.1/user_doc/quick-start.md         |  65 ++
 docs/en-us/1.2.1/user_doc/system-manual.md       | 738 +++++++++++++++++++++++
 docs/en-us/1.2.1/user_doc/upgrade.md             |  39 ++
 site_config/site.js                              |   6 +-
 7 files changed, 1389 insertions(+), 3 deletions(-)

diff --git a/docs/en-us/1.2.1/user_doc/architecture-design.md 
b/docs/en-us/1.2.1/user_doc/architecture-design.md
new file mode 100644
index 0000000..cdc1c89
--- /dev/null
+++ b/docs/en-us/1.2.1/user_doc/architecture-design.md
@@ -0,0 +1,316 @@
+## Architecture Design
+Before explaining the architecture of the schedule system, let us first 
understand the common nouns of the schedule system.
+
+### 1.Noun Interpretation
+
+**DAG：** Full name Directed Acyclic Graph，referred to as DAG。Tasks in the 
workflow are assembled in the form of directed acyclic graphs, which are 
topologically traversed from nodes with zero indegrees of ingress until there 
are no successor nodes. For example, the following picture:
+
+<p align="center">
+  <img src="/img/dag_examples_cn.jpg" alt="dag示例"  width="60%" />
+  <p align="center">
+        <em>dag example</em>
+  </p>
+</p>
+
+**Process definition**: Visualization **DAG** by dragging task nodes and 
establishing associations of task nodes 
+
+**Process instance**: A process instance is an instantiation of a process 
definition, which can be generated by manual startup or  scheduling. The 
process definition runs once, a new process instance is generated
+
+**Task instance**: A task instance is the instantiation of a specific task 
node when a process instance runs, which indicates the specific task execution 
status
+
+**Task type**: Currently supports SHELL, SQL, SUB_PROCESS (sub-process), 
PROCEDURE, MR, SPARK, PYTHON, DEPENDENT (dependency), and plans to support 
dynamic plug-in extension, note: the sub-**SUB_PROCESS** is also A separate 
process definition that can be launched separately
+
+**Schedule mode** :  The system supports timing schedule and manual schedule 
based on cron expressions. Command type support: start workflow, start 
execution from current node, resume fault-tolerant workflow, resume pause 
process, start execution from failed node, complement, timer, rerun, pause, 
stop, resume waiting thread. Where **recovers the fault-tolerant workflow** and 
**restores the waiting thread** The two command types are used by the 
scheduling internal control and cannot be ca [...]
+
+**Timed schedule**: The system uses **quartz** distributed scheduler and 
supports the generation of cron expression visualization
+
+**Dependency**: The system does not only support **DAG** Simple dependencies 
between predecessors and successor nodes, but also provides **task 
dependencies** nodes, support for **custom task dependencies between processes**
+
+**Priority**: Supports the priority of process instances and task instances. 
If the process instance and task instance priority are not set, the default is 
first in, first out.
+
+**Mail Alert**: Support **SQL Task** Query Result Email Send, Process Instance 
Run Result Email Alert and Fault Tolerant Alert Notification
+
+**Failure policy**: For tasks running in parallel, if there are tasks that 
fail, two failure policy processing methods are provided. **Continue** means 
that the status of the task is run in parallel until the end of the process 
failure. **End** means that once a failed task is found, Kill also drops the 
running parallel task and the process ends.
+
+**Complement**: Complement historical data, support **interval parallel and 
serial** two complement methods
+
+
+
+### 2.System architecture
+
+#### 2.1 System Architecture Diagram
+<p align="center">
+  <img src="/img/architecture.jpg" alt="System Architecture Diagram"  />
+  <p align="center">
+        <em>System Architecture Diagram</em>
+  </p>
+</p>
+
+
+
+#### 2.2 Architectural description
+
+* **MasterServer** 
+
+    MasterServer adopts the distributed non-central design concept. 
MasterServer is mainly responsible for DAG task split, task submission 
monitoring, and monitoring the health status of other MasterServer and 
WorkerServer.
+    When the MasterServer service starts, it registers a temporary node with 
Zookeeper, and listens to the Zookeeper temporary node state change for fault 
tolerance processing.
+
+    
+
+    ##### The service mainly contains:
+
+    - **Distributed Quartz** distributed scheduling component, mainly 
responsible for the start and stop operation of the scheduled task. When the 
quartz picks up the task, the master internally has a thread pool to be 
responsible for the subsequent operations of the task.
+
+    - **MasterSchedulerThread** is a scan thread that periodically scans the 
**command** table in the database for different business operations based on 
different **command types**
+
+    - **MasterExecThread** is mainly responsible for DAG task segmentation, 
task submission monitoring, logic processing of various command types
+
+    - **MasterTaskExecThread** is mainly responsible for task persistence
+
+      
+
+* **WorkerServer** 
+
+     - WorkerServer also adopts a distributed, non-central design concept. 
WorkerServer is mainly responsible for task execution and providing log 
services. When the WorkerServer service starts, it registers the temporary node 
with Zookeeper and maintains the heartbeat.
+
+       ##### This service contains:
+
+       - **FetchTaskThread** is mainly responsible for continuously receiving 
tasks from **Task Queue** and calling **TaskScheduleThread** corresponding 
executors according to different task types.
+       - **LoggerServer** is an RPC service that provides functions such as 
log fragment viewing, refresh and download.
+
+     - **ZooKeeper**
+
+       The ZooKeeper service, the MasterServer and the WorkerServer nodes in 
the system all use the ZooKeeper for cluster management and fault tolerance. In 
addition, the system also performs event monitoring and distributed locking 
based on ZooKeeper.
+       We have also implemented queues based on Redis, but we hope that 
DolphinScheduler relies on as few components as possible, so we finally removed 
the Redis implementation.
+
+     - **Task Queue**
+
+       The task queue operation is provided. Currently, the queue is also 
implemented based on Zookeeper. Since there is less information stored in the 
queue, there is no need to worry about too much data in the queue. In fact, we 
have over-measured a million-level data storage queue, which has no effect on 
system stability and performance.
+
+     - **Alert**
+
+       Provides alarm-related interfaces. The interfaces mainly include 
**Alarms**. The storage, query, and notification functions of the two types of 
alarm data. The notification function has two types: **mail notification** and 
**SNMP (not yet implemented)**.
+
+     - **API**
+
+       The API interface layer is mainly responsible for processing requests 
from the front-end UI layer. The service provides a RESTful api to provide 
request services externally.
+       Interfaces include workflow creation, definition, query, modification, 
release, offline, manual start, stop, pause, resume, start execution from this 
node, and more.
+
+     - **UI**
+
+       The front-end page of the system provides various visual operation 
interfaces of the system. For details, see the <a 
href="/en-us/docs/user_doc/system-manual.html" target="_self">System User 
Manual</a> section.
+
+     
+
+#### 2.3 Architectural Design Ideas
+
+##### I. Decentralized vs centralization
+
+###### Centralization Thought
+
+The centralized design concept is relatively simple. The nodes in the 
distributed cluster are divided into two roles according to their roles:
+
+<p align="center">
+   <img 
src="https://analysys.github.io/easyscheduler_docs_cn/images/master_slave.png"; 
alt="master-slave role" width="50%" />
+ </p>
+
+- The role of Master is mainly responsible for task distribution and 
supervising the health status of Slave. It can dynamically balance the task to 
Slave, so that the Slave node will not be "busy" or "free".
+- The role of the Worker is mainly responsible for the execution of the task 
and maintains the heartbeat with the Master so that the Master can assign tasks 
to the Slave.
+
+Problems in the design of centralized :
+
+- Once the Master has a problem, the group has no leader and the entire 
cluster will crash. In order to solve this problem, most Master/Slave 
architecture modes adopt the design scheme of the master and backup masters, 
which can be hot standby or cold standby, automatic switching or manual 
switching, and more and more new systems are available. Automatically elects 
the ability to switch masters to improve system availability.
+- Another problem is that if the Scheduler is on the Master, although it can 
support different tasks in one DAG running on different machines, it will 
generate overload of the Master. If the Scheduler is on the Slave, all tasks in 
a DAG can only be submitted on one machine. If there are more parallel tasks, 
the pressure on the Slave may be larger.
+
+###### Decentralization
+
+ <p align="center"
+   <img 
src="https://analysys.github.io/easyscheduler_docs_cn/images/decentralization.png";
 alt="decentralized" width="50%" />
+ </p>
+
+- In the decentralized design, there is usually no Master/Slave concept, all 
roles are the same, the status is equal, the global Internet is a typical 
decentralized distributed system, networked arbitrary node equipment down 
machine , all will only affect a small range of features.
+- The core design of decentralized design is that there is no "manager" that 
is different from other nodes in the entire distributed system, so there is no 
single point of failure problem. However, since there is no "manager" node, 
each node needs to communicate with other nodes to get the necessary machine 
information, and the unreliable line of distributed system communication 
greatly increases the difficulty of implementing the above functions.
+- In fact, truly decentralized distributed systems are rare. Instead, dynamic 
centralized distributed systems are constantly emerging. Under this 
architecture, the managers in the cluster are dynamically selected, rather than 
preset, and when the cluster fails, the nodes of the cluster will spontaneously 
hold "meetings" to elect new "managers". Go to preside over the work. The most 
typical case is the Etcd implemented in ZooKeeper and Go.
+
+- Decentralization of DolphinScheduler is the registration of Master/Worker to 
ZooKeeper. The Master Cluster and the Worker Cluster are not centered, and the 
Zookeeper distributed lock is used to elect one Master or Worker as the 
“manager” to perform the task.
+
+#####  二、Distributed lock practice
+
+DolphinScheduler uses ZooKeeper distributed locks to implement only one Master 
to execute the Scheduler at the same time, or only one Worker to perform task 
submission.
+
+1. The core process algorithm for obtaining distributed locks is as follows
+
+ <p align="center">
+   <img 
src="https://analysys.github.io/easyscheduler_docs_cn/images/distributed_lock.png";
 alt="Get Distributed Lock Process" width="50%" />
+ </p>
+
+2. Scheduler thread distributed lock implementation flow chart in 
DolphinScheduler:
+
+ <p align="center">
+   <img src="/img/distributed_lock_procss.png" alt="Get Distributed Lock 
Process" width="50%" />
+ </p>
+
+##### Third, the thread is insufficient loop waiting problem
+
+- If there is no subprocess in a DAG, if the number of data in the Command is 
greater than the threshold set by the thread pool, the direct process waits or 
fails.
+- If a large number of sub-processes are nested in a large DAG, the following 
figure will result in a "dead" state:
+
+ <p align="center">
+   <img 
src="https://analysys.github.io/easyscheduler_docs_cn/images/lack_thread.png"; 
alt="Thread is not enough to wait for loop" width="50%" />
+ </p>
+
+In the above figure, MainFlowThread waits for SubFlowThread1 to end, 
SubFlowThread1 waits for SubFlowThread2 to end, SubFlowThread2 waits for 
SubFlowThread3 to end, and SubFlowThread3 waits for a new thread in the thread 
pool, then the entire DAG process cannot end, and thus the thread cannot be 
released. This forms the state of the child parent process loop waiting. At 
this point, the scheduling cluster will no longer be available unless a new 
Master is started to add threads to break s [...]
+
+It seems a bit unsatisfactory to start a new Master to break the deadlock, so 
we proposed the following three options to reduce this risk:
+
+1. Calculate the sum of the threads of all Masters, and then calculate the 
number of threads required for each DAG, that is, pre-calculate before the DAG 
process is executed. Because it is a multi-master thread pool, the total number 
of threads is unlikely to be obtained in real time.
+2. Judge the single master thread pool. If the thread pool is full, let the 
thread fail directly.
+3. Add a Command type with insufficient resources. If the thread pool is 
insufficient, the main process will be suspended. This way, the thread pool has 
a new thread, which can make the process with insufficient resources hang up 
and wake up again.
+
+Note: The Master Scheduler thread is FIFO-enabled when it gets the Command.
+
+So we chose the third way to solve the problem of insufficient threads.
+
+##### IV. Fault Tolerant Design
+
+Fault tolerance is divided into service fault tolerance and task retry. 
Service fault tolerance is divided into two types: Master Fault Tolerance and 
Worker Fault Tolerance.
+
+###### 1. Downtime fault tolerance
+
+Service fault tolerance design relies on ZooKeeper's Watcher mechanism. The 
implementation principle is as follows:
+
+ <p align="center">
+   <img 
src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant.png";
 alt="DolphinScheduler Fault Tolerant Design" width="40%" />
+ </p>
+
+The Master monitors the directories of other Masters and Workers. If the 
remove event is detected, the process instance is fault-tolerant or the task 
instance is fault-tolerant according to the specific business logic.
+
+
+
+- Master fault tolerance flow chart:
+
+ <p align="center">
+   <img 
src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant_master.png";
 alt="Master Fault Tolerance Flowchart" width="40%" />
+ </p>
+
+After the ZooKeeper Master is fault-tolerant, it is rescheduled by the 
Scheduler thread in DolphinScheduler. It traverses the DAG to find the 
"Running" and "Submit Successful" tasks, and monitors the status of its task 
instance for the "Running" task. You need to determine whether the Task Queue 
already exists. If it exists, monitor the status of the task instance. If it 
does not exist, resubmit the task instance.
+
+
+
+- Worker fault tolerance flow chart:
+
+ <p align="center">
+   <img 
src="https://analysys.github.io/easyscheduler_docs_cn/images/fault-tolerant_worker.png";
 alt="Worker Fault Tolerance Flowchart" width="40%" />
+ </p>
+
+Once the Master Scheduler thread finds the task instance as "need to be fault 
tolerant", it takes over the task and resubmits.
+
+ Note: Because the "network jitter" may cause the node to lose the heartbeat 
of ZooKeeper in a short time, the node's remove event occurs. In this case, we 
use the easiest way, that is, once the node has timeout connection with 
ZooKeeper, it will directly stop the Master or Worker service.
+
+###### 2. Task failure retry
+
+Here we must first distinguish between the concept of task failure retry, 
process failure recovery, and process failure rerun:
+
+- Task failure Retry is task level, which is automatically performed by the 
scheduling system. For example, if a shell task sets the number of retries to 3 
times, then the shell task will try to run up to 3 times after failing to run.
+- Process failure recovery is process level, is done manually, recovery can 
only be performed **from the failed node** or **from the current node**
+- Process failure rerun is also process level, is done manually, rerun is from 
the start node
+
+
+
+Next, let's talk about the topic, we divided the task nodes in the workflow 
into two types.
+
+- One is a business node, which corresponds to an actual script or processing 
statement, such as a Shell node, an MR node, a Spark node, a dependent node, 
and so on.
+- There is also a logical node, which does not do the actual script or 
statement processing, but the logical processing of the entire process flow, 
such as sub-flow sections.
+
+Each **service node** can configure the number of failed retries. When the 
task node fails, it will automatically retry until it succeeds or exceeds the 
configured number of retries. **Logical node** does not support failed retry. 
But the tasks in the logical nodes support retry.
+
+If there is a task failure in the workflow that reaches the maximum number of 
retries, the workflow will fail to stop, and the failed workflow can be 
manually rerun or process resumed.
+
+
+
+##### V. Task priority design
+
+In the early scheduling design, if there is no priority design and fair 
scheduling design, it will encounter the situation that the task submitted 
first may be completed simultaneously with the task submitted subsequently, but 
the priority of the process or task cannot be set. We have redesigned this, and 
we are currently designing it as follows:
+
+- According to **different process instance priority** prioritizes **same 
process instance priority** prioritizes **task priority within the same 
process** takes precedence over **same process** commit order from high Go to 
low for task processing.
+
+  - The specific implementation is to resolve the priority according to the 
json of the task instance, and then save the **process instance priority _ 
process instance id_task priority _ task id** information in the ZooKeeper task 
queue, when obtained from the task queue, Through string comparison, you can 
get the task that needs to be executed first.
+
+    - The priority of the process definition is that some processes need to be 
processed before other processes. This can be configured at the start of the 
process or at the time of scheduled start. There are 5 levels, followed by 
HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
+
+      <p align="center">
+         <img 
src="https://analysys.github.io/easyscheduler_docs_cn/images/process_priority.png";
 alt="Process Priority Configuration" width="40%" />
+       </p>
+
+    - The priority of the task is also divided into 5 levels, followed by 
HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
+
+      <p align="center">
+         <img 
src="https://analysys.github.io/easyscheduler_docs_cn/images/task_priority.png"; 
alt="task priority configuration" width="35%" />
+       </p>
+
+##### VI. Logback and gRPC implement log access
+
+- Since the Web (UI) and Worker are not necessarily on the same machine, 
viewing the log is not as it is for querying local files. There are two options:
+  - Put the logs on the ES search engine
+  - Obtain remote log information through gRPC communication
+- Considering the lightweightness of DolphinScheduler as much as possible, 
gRPC was chosen to implement remote access log information.
+
+ <p align="center">
+   <img src="https://analysys.github.io/easyscheduler_docs_cn/images/grpc.png"; 
alt="grpc remote access" width="50%" />
+ </p>
+
+- We use a custom Logback FileAppender and Filter function to generate a log 
file for each task instance.
+- The main implementation of FileAppender is as follows:
+
+```java
+ /**
+  * task log appender
+  */
+ Public class TaskLogAppender extends FileAppender<ILoggingEvent {
+ 
+     ...
+
+    @Override
+    Protected void append(ILoggingEvent event) {
+
+        If (currentlyActiveFile == null){
+            currentlyActiveFile = getFile();
+        }
+        String activeFile = currentlyActiveFile;
+        // thread name: 
taskThreadName-processDefineId_processInstanceId_taskInstanceId
+        String threadName = event.getThreadName();
+        String[] threadNameArr = threadName.split("-");
+        // logId = processDefineId_processInstanceId_taskInstanceId
+        String logId = threadNameArr[1];
+        ...
+        super.subAppend(event);
+    }
+}
+```
+
+Generate a log in the form of /process definition id/process instance id/task 
instance id.log
+
+- Filter matches the thread name starting with TaskLogInfo:
+- TaskLogFilter is implemented as follows:
+
+```java
+ /**
+ * task log filter
+ */
+Public class TaskLogFilter extends Filter<ILoggingEvent {
+
+    @Override
+    Public FilterReply decide(ILoggingEvent event) {
+        If (event.getThreadName().startsWith("TaskLogInfo-")){
+            Return FilterReply.ACCEPT;
+        }
+        Return FilterReply.DENY;
+    }
+}
+```
+
+
+
+### summary
+
+Starting from the scheduling, this paper introduces the architecture principle 
and implementation ideas of the big data distributed workflow scheduling 
system-DolphinScheduler. To be continued
diff --git a/docs/en-us/1.2.1/user_doc/metadata-1.2.md 
b/docs/en-us/1.2.1/user_doc/metadata-1.2.md
new file mode 100644
index 0000000..2d706f9
--- /dev/null
+++ b/docs/en-us/1.2.1/user_doc/metadata-1.2.md
@@ -0,0 +1,174 @@
+# Dolphin Scheduler 1.2 MetaData
+
+<a name="V5KOl"></a>
+### Dolphin Scheduler 1.2 DB Table Overview
+| Table Name | Comment |
+| :---: | :---: |
+| t_ds_access_token | token for access ds backend |
+| t_ds_alert | alert detail |
+| t_ds_alertgroup | alert group |
+| t_ds_command | command detail |
+| t_ds_datasource | data source |
+| t_ds_error_command | error command detail |
+| t_ds_process_definition | process difinition |
+| t_ds_process_instance | process instance |
+| t_ds_project | project |
+| t_ds_queue | queue |
+| t_ds_relation_datasource_user | datasource related to user |
+| t_ds_relation_process_instance | sub process |
+| t_ds_relation_project_user | project related to user |
+| t_ds_relation_resources_user | resource related to user |
+| t_ds_relation_udfs_user | UDF related to user |
+| t_ds_relation_user_alertgroup | alert group related to user |
+| t_ds_resources | resoruce center file |
+| t_ds_schedules | process difinition schedule |
+| t_ds_session | user login session |
+| t_ds_task_instance | task instance |
+| t_ds_tenant | tenant |
+| t_ds_udfs | UDF resource |
+| t_ds_user | user detail |
+| t_ds_version | ds version |
+| t_ds_worker_group | worker group |
+
+
+---
+
+<a name="XCLy1"></a>
+### E-R Diagram
+<a name="5hWWZ"></a>
+#### User Queue DataSource
+![image.png](/img/metadata-erd/user-queue-datasource.png)
+
+- Multiple users can belong to one tenant
+- The queue field in t_ds_user table stores the queue_name information in 
t_ds_queue table, but t_ds_tenant stores queue infomation using queue_id. 
During the execution of the process definition, the user queue has the highest 
priority. If the user queue is empty, the tenant queue is used.
+- The user_id field in the t_ds_datasource table indicates the user who 
created the data source. The user_id in t_ds_relation_datasource_user indicates 
the user who has permission to the data source.
+<a name="7euSN"></a>
+#### Project Resource Alert
+![image.png](/img/metadata-erd/project-resource-alert.png)
+
+- User can have multiple projects, User project authorization completes the 
relationship binding using project_id and user_id in t_ds_relation_project_user 
table
+- The user_id in the t_ds_projcet table represents the user who created the 
project, and the user_id in the t_ds_relation_project_user table represents 
users who have permission to the project
+- The user_id in the t_ds_resources table represents the user who created the 
resource, and the user_id in t_ds_relation_resources_user represents the user 
who has permissions to the resource
+- The user_id in the t_ds_udfs table represents the user who created the UDF, 
and the user_id in the t_ds_relation_udfs_user table represents a user who has 
permission to the UDF
+<a name="JEw4v"></a>
+#### Command Process Task
+![image.png](/img/metadata-erd/command.png)<br 
/>![image.png](/img/metadata-erd/process-task.png)
+
+- A project has multiple process definitions, a process definition can 
generate multiple process instances, and a process instance can generate 
multiple task instances
+- The t_ds_schedulers table stores the timing schedule information for process 
difinition
+- The data stored in the t_ds_relation_process_instance table is used to deal 
with that the process definition contains sub-processes, 
parent_process_instance_id field represents the id of the main process instance 
containing the child process, process_instance_id field represents the id of 
the sub-process instance, parent_task_instance_id field represents the task 
instance id of the sub-process node
+- The process instance table and the task instance table correspond to the 
t_ds_process_instance table and the t_ds_task_instance table, respectively.
+
+---
+
+<a name="yd79T"></a>
+### Core Table Schema
+<a name="6bVhH"></a>
+#### t_ds_process_definition
+| Field | Type | Comment |
+| --- | --- | --- |
+| id | int | primary key |
+| name | varchar | process definition name |
+| version | int | process definition version |
+| release_state | tinyint | process definition release 
state：0:offline,1:online |
+| project_id | int | project id |
+| user_id | int | process definition creator id |
+| process_definition_json | longtext | process definition json content |
+| description | text | process difinition desc |
+| global_params | text | global parameters |
+| flag | tinyint | process is available: 0 not available, 1 available |
+| locations | text | Node location information |
+| connects | text | Node connection information |
+| receivers | text | receivers |
+| receivers_cc | text | carbon copy list |
+| create_time | datetime | create time |
+| timeout | int | timeout |
+| tenant_id | int | tenant id |
+| update_time | datetime | update time |
+
+<a name="t5uxM"></a>
+#### t_ds_process_instance
+| Field | Type | Comment |
+| --- | --- | --- |
+| id | int | primary key |
+| name | varchar | process instance name |
+| process_definition_id | int | process definition id |
+| state | tinyint | process instance Status: 0 commit succeeded, 1 running, 2 
prepare to pause, 3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need 
fault tolerance, 9 kill, 10 wait for thread, 11 wait for dependency to complete 
|
+| recovery | tinyint | process instance failover flag：0:normal,1:failover 
instance |
+| start_time | datetime | process instance start time |
+| end_time | datetime | process instance end time |
+| run_times | int | process instance run times |
+| host | varchar | process instance host |
+| command_type | tinyint | command type：0 start ,1 Start from the current 
node,2 Resume a fault-tolerant process,3 Resume Pause Process, 4 Execute from 
the failed node,5 Complement, 6 dispatch, 7 re-run, 8 pause, 9 stop ,10 Resume 
waiting thread |
+| command_param | text | json command parameters |
+| task_depend_type | tinyint | task depend type. 0: only current node,1:before 
the node,2:later nodes |
+| max_try_times | tinyint | max try times |
+| failure_strategy | tinyint | failure strategy. 0:end the process when node 
failed,1:continue running the other nodes when node failed |
+| warning_type | tinyint | warning type. 0:no warning,1:warning if process 
success,2:warning if process failed,3:warning if success |
+| warning_group_id | int | warning group id |
+| schedule_time | datetime | schedule time |
+| command_start_time | datetime | command start time |
+| global_params | text | global parameters |
+| process_instance_json | longtext | process instance json(copy的process 
definition 的json) |
+| flag | tinyint | process instance is available: 0 not available, 1 available 
|
+| update_time | timestamp | update time |
+| is_sub_process | int | whether the process is sub process:  1 sub-process，0 
not sub-process |
+| executor_id | int | executor id |
+| locations | text | Node location information |
+| connects | text | Node connection information |
+| history_cmd | text | history commands of process instance operation |
+| dependence_schedule_times | text | depend schedule fire time |
+| process_instance_priority | int | process instance priority. 0 Highest,1 
High,2 Medium,3 Low,4 Lowest |
+| worker_group_id | int | worker group id |
+| timeout | int | time out |
+| tenant_id | int | tenant id |
+
+<a name="tHZsY"></a>
+#### t_ds_task_instance
+| Field | Type | Comment |
+| --- | --- | --- |
+| id | int | primary key |
+| name | varchar | task name |
+| task_type | varchar | task type |
+| process_definition_id | int | process definition id |
+| process_instance_id | int | process instance id |
+| task_json | longtext | task content json |
+| state | tinyint | Status: 0 commit succeeded, 1 running, 2 prepare to pause, 
3 pause, 4 prepare to stop, 5 stop, 6 fail, 7 succeed, 8 need fault tolerance, 
9 kill, 10 wait for thread, 11 wait for dependency to complete |
+| submit_time | datetime | task submit time |
+| start_time | datetime | task start time |
+| end_time | datetime | task end time |
+| host | varchar | host of task running on |
+| execute_path | varchar | task execute path in the host |
+| log_path | varchar | task log path |
+| alert_flag | tinyint | whether alert |
+| retry_times | int | task retry times |
+| pid | int | pid of task |
+| app_link | varchar | yarn app id |
+| flag | tinyint | taskinstance is available: 0 not available, 1 available |
+| retry_interval | int | retry interval when task failed  |
+| max_retry_times | int | max retry times |
+| task_instance_priority | int | task instance priority:0 Highest,1 High,2 
Medium,3 Low,4 Lowest |
+| worker_group_id | int | worker group id |
+
+<a name="gLGtm"></a>
+#### t_ds_command
+| Field | Type | Comment |
+| --- | --- | --- |
+| id | int | primary key |
+| command_type | tinyint | Command type: 0 start workflow, 1 start execution 
from current node, 2 resume fault-tolerant workflow, 3 resume pause process, 4 
start execution from failed node, 5 complement, 6 schedule, 7 rerun, 8 pause, 9 
stop, 10 resume waiting thread |
+| process_definition_id | int | process definition id |
+| command_param | text | json command parameters |
+| task_depend_type | tinyint | Node dependency type: 0 current node, 1 
forward, 2 backward |
+| failure_strategy | tinyint | Failed policy: 0 end, 1 continue |
+| warning_type | tinyint | Alarm type: 0 is not sent, 1 process is sent 
successfully, 2 process is sent failed, 3 process is sent successfully and all 
failures are sent |
+| warning_group_id | int | warning group |
+| schedule_time | datetime | schedule time |
+| start_time | datetime | start time |
+| executor_id | int | executor id |
+| dependence | varchar | dependence |
+| update_time | datetime | update time |
+| process_instance_priority | int | process instance priority: 0 Highest,1 
High,2 Medium,3 Low,4 Lowest |
+| worker_group_id | int | worker group id |
+
+
+
diff --git a/docs/en-us/1.2.1/user_doc/plugin-development.md 
b/docs/en-us/1.2.1/user_doc/plugin-development.md
new file mode 100644
index 0000000..eda2d82
--- /dev/null
+++ b/docs/en-us/1.2.1/user_doc/plugin-development.md
@@ -0,0 +1,54 @@
+## Task Plugin Development
+
+Remind:Currently, task plugin development does not support hot deployment.
+
+### Shell-based tasks
+
+#### YARN-based calculations (see MapReduceTask)
+
+- Need to be **cn.dolphinscheduler.server.worker.task** Down **TaskManager** 
Create a custom task in the class (also need to register the corresponding task 
type in TaskType)
+- Need to inherit**cn.dolphinscheduler.server.worker.task** Down 
**AbstractYarnTask**
+- Constructor Scheduling **AbstractYarnTask** Construction method
+- Inherit **AbstractParameters** Custom task parameter entity
+- Rewrite **AbstractTask** of **init** Parsing in method**Custom task 
parameters**
+- Rewrite **buildCommand** Encapsulation command
+
+
+
+#### Non-YARN-based calculations (see ShellTask)
+- Need to be **cn.dolphinscheduler.server.worker.task** Down **TaskManager** A 
custom task
+
+- Need to inherit**cn.dolphinscheduler.server.worker.task** Down 
**AbstractTask**
+
+- Instantiation in constructor **ShellCommandExecutor**
+
+  ```
+  public ShellTask(TaskProps props, Logger logger) {
+    super(props, logger);
+  
+    this.taskDir = props.getTaskDir();
+  
+    this.processTask = new ShellCommandExecutor(this::logHandle,
+        props.getTaskDir(), props.getTaskAppId(),
+        props.getTenantCode(), props.getEnvFile(), props.getTaskStartTime(),
+        props.getTaskTimeout(), logger);
+    this.processDao = DaoFactory.getDaoInstance(ProcessDao.class);
+  }
+  ```
+
+  Incoming custom tasks **TaskProps**And custom**Logger**，TaskProps 
Encapsulate task information, Logger is installed with custom log information
+
+- Inherit **AbstractParameters** Custom task parameter entity
+
+- Rewrite **AbstractTask** of **init** Parsing in method**Custom task 
parameter entity**
+
+- Rewrite **handle** method，transfer **ShellCommandExecutor** of **run** 
method，The first parameter is passed in**command**，Pass the second parameter to 
ProcessDao and set the corresponding **exitStatusCode**
+
+### Non-SHELL-based tasks (see SqlTask)
+
+- Need to be **cn.dolphinscheduler.server.worker.task** Down **TaskManager** A 
custom task
+- Need to inherit**cn.dolphinscheduler.server.worker.task** Down 
**AbstractTask**
+- Inherit **AbstractParameters** Custom task parameter entity
+- Constructor or override **AbstractTask** of **init** in the method, parse 
the custom task parameter entity
+- Rewrite **handle** Methods to implement business logic and set the 
corresponding**exitStatusCode**
+
diff --git a/docs/en-us/1.2.1/user_doc/quick-start.md 
b/docs/en-us/1.2.1/user_doc/quick-start.md
new file mode 100644
index 0000000..7e4ac7d
--- /dev/null
+++ b/docs/en-us/1.2.1/user_doc/quick-start.md
@@ -0,0 +1,65 @@
+# Quick Start
+
+* Administrator user login
+
+  > Address：192.168.xx.xx:8888  Username and password：admin/dolphinscheduler123
+
+<p align="center">
+   <img src="/img/login_en.png" width="60%" />
+ </p>
+
+* Create queue
+
+<p align="center">
+   <img src="/img/create-queue-en.png" width="60%" />
+ </p>
+
+  * Create tenant
+      <p align="center">
+    <img src="/img/create-tenant-en.png" width="60%" />
+  </p>
+
+  * Creating Ordinary Users
+<p align="center">
+      <img src="/img/create-user-en.png" width="60%" />
+ </p>
+
+  * Create an alarm group
+
+ <p align="center">
+    <img src="/img/alarm-group-en.png" width="60%" />
+  </p>
+
+  
+  * Create an worker group
+  
+   <p align="center">
+      <img src="/img/worker-group-en.png" width="60%" />
+    </p>
+    
+ * Create an token
+  
+   <p align="center">
+      <img src="/img/token-en.png" width="60%" />
+    </p>
+     
+  
+  * Log in with regular users
+  > Click on the user name in the upper right corner to "exit" and re-use the 
normal user login.
+
+  * Project Management - > Create Project - > Click on Project Name
+<p align="center">
+      <img src="/img/create_project_en.png" width="60%" />
+ </p>
+
+  * Click Workflow Definition - > Create Workflow Definition - > Online 
Process Definition
+
+<p align="center">
+   <img src="/img/process_definition_en.png" width="60%" />
+ </p>
+
+  * Running Process Definition - > Click Workflow Instance - > Click Process 
Instance Name - > Double-click Task Node - > View Task Execution Log
+
+ <p align="center">
+   <img src="/img/log_en.png" width="60%" />
+</p>
diff --git a/docs/en-us/1.2.1/user_doc/system-manual.md 
b/docs/en-us/1.2.1/user_doc/system-manual.md
new file mode 100644
index 0000000..29492b9
--- /dev/null
+++ b/docs/en-us/1.2.1/user_doc/system-manual.md
@@ -0,0 +1,738 @@
+# System Use Manual
+
+## Operational Guidelines
+
+### Home page
+The homepage contains task status statistics, process status statistics, and 
workflow definition statistics for all user projects.
+
+<p align="center">
+      <img src="/img/home_en.png" width="80%" />
+ </p>
+
+### Create a project
+
+  - Click "Project - > Create Project", enter project name,  description, and 
click "Submit" to create a new project.
+  - Click on the project name to enter the project home page.
+<p align="center">
+      <img src="/img/project_home_en.png" width="80%" />
+ </p>
+
+> The project home page contains task status statistics, process status 
statistics, and workflow definition statistics for the project.
+
+ - Task State Statistics: It refers to the statistics of the number of tasks 
to be run, failed, running, completed and succeeded in a given time frame.
+ - Process State Statistics: It refers to the statistics of the number of 
waiting, failing, running, completing and succeeding process instances in a 
specified time range.
+ - Process Definition Statistics: The process definition created by the user 
and the process definition granted by the administrator to the user are counted.
+
+
+### Creating Process definitions
+  - Go to the project home page, click "Process definitions" and enter the 
list page of process definition.
+  - Click "Create process" to create a new process definition.
+  - Drag the "SHELL" node to the canvas and add a shell task.
+  - Fill in the Node Name, Description, and Script fields.
+  - Selecting "task priority" will give priority to high-level tasks in the 
execution queue. Tasks with the same priority will be executed in the 
first-in-first-out order.
+  - Timeout alarm. Fill in "Overtime Time". When the task execution time 
exceeds the overtime, it can alarm and fail over time.
+  - Fill in "Custom Parameters" and refer to [Custom 
Parameters](#CustomParameters)
+    <p align="center">
+    <img src="/img/process_definitions_en.png" width="80%" />
+      </p>
+  - Increase the order of execution between nodes: click "line connection". As 
shown, task 2 and task 3 are executed in parallel. When task 1 is executed, 
task 2 and task 3 are executed simultaneously.
+
+<p align="center">
+   <img src="/img/task_en.png" width="80%" />
+ </p>
+
+  - Delete dependencies: Click on the arrow icon to "drag nodes and select 
items", select the connection line, click on the delete icon to delete 
dependencies between nodes.
+<p align="center">
+      <img src="/img/delete_dependencies_en.png" width="80%" />
+ </p>
+
+  - Click "Save", enter the name of the process definition, the description of 
the process definition, and set the global parameters.
+
+<p align="center">
+   <img src="/img/global_parameters_en.png" width="80%" />
+ </p>
+
+  - For other types of nodes, refer to [task node types and parameter 
settings](#TaskNodeType)
+
+### Execution process definition
+  - **The process definition of the off-line state can be edited, but not 
run**, so the on-line workflow is the first step.
+  > Click on the Process definition, return to the list of process 
definitions, click on the icon "online", online process definition.
+
+  > Before setting workflow offline, the timed tasks in timed management 
should be offline, so that the definition of workflow can be set offline 
successfully. 
+
+  - Click "Run" to execute the process. Description of operation parameters：
+    * Failure strategy：**When a task node fails to execute, other parallel 
task nodes need to execute the strategy**。”Continue "Representation: Other task 
nodes perform normally", "End" Representation: Terminate all ongoing tasks and 
terminate the entire process.
+    * Notification strategy：When the process is over, send process execution 
information notification mail according to the process status.
+    * Process priority: The priority of process running is divided into five 
levels:the highest, the high, the medium, the low, and the lowest . High-level 
processes are executed first in the execution queue, and processes with the 
same priority are executed first in first out order.
+    * Worker group: This process can only be executed in a specified machine 
group. Default, by default, can be executed on any worker.
+    * Notification group: When the process ends or fault tolerance occurs, 
process information is sent to all members of the notification group by mail.
+    * Recipient: Enter the mailbox and press Enter key to save. When the 
process ends and fault tolerance occurs, an alert message is sent to the 
recipient list.
+    * Cc: Enter the mailbox and press Enter key to save. When the process is 
over and fault-tolerant occurs, alarm messages are copied to the copier list.
+    
+<p align="center">
+   <img src="/img/start-process-en.png" width="80%" />
+ </p>
+
+  * Complement: To implement the workflow definition of a specified date, you 
can select the time range of the complement (currently only support for 
continuous days), such as the data from May 1 to May 10, as shown in the figure:
+  
+<p align="center">
+   <img src="/img/complement-en.png" width="80%" />
+ </p>
+
+> Complement execution mode includes serial execution and parallel execution. 
In serial mode, the complement will be executed sequentially from May 1 to May 
10. In parallel mode, the tasks from May 1 to May 10 will be executed 
simultaneously.
+
+### Timing Process Definition
+  - Create Timing: "Process Definition - > Timing"
+  - Choose start-stop time, in the start-stop time range, regular normal work, 
beyond the scope, will not continue to produce timed workflow instances.
+  
+<p align="center">
+   <img src="/img/timing-en.png" width="80%" />
+ </p>
+
+  - Add a timer to be executed once a day at 5:00 a.m. as shown below:
+<p align="center">
+      <img src="/img/timer-en.png" width="80%" />
+ </p>
+
+  - Timely online，**the newly created timer is offline. You need to click 
"Timing Management - >online" to work properly.**
+
+### View process instances
+  > Click on "Process Instances" to view the list of process instances.
+
+  > Click on the process name to see the status of task execution.
+
+  <p align="center">
+   <img src="/img/process-instances-en.png" width="80%" />
+ </p>
+
+  > Click on the task node, click "View Log" to view the task execution log.
+
+  <p align="center">
+   <img src="/img/view-log-en.png" width="80%" />
+ </p>
+
+ > Click on the task instance node, click **View History** to view the list of 
task instances that the process instance runs.
+
+ <p align="center">
+    <img src="/img/instance-runs-en.png" width="80%" />
+  </p>
+
+
+  > Operations on workflow instances:
+
+<p align="center">
+   <img src="/img/workflow-instances-en.png" width="80%" />
+</p>
+
+  * Editor: You can edit the terminated process. When you save it after 
editing, you can choose whether to update the process definition or not.
+  * Rerun: A process that has been terminated can be re-executed.
+  * Recovery failure: For a failed process, a recovery failure operation can 
be performed, starting at the failed node.
+  * Stop: Stop the running process, the background will `kill` he worker 
process first, then `kill -9` operation.
+  * Pause：The running process can be **suspended**, the system state becomes 
**waiting to be executed**, waiting for the end of the task being executed, and 
suspending the next task to be executed.
+  * Restore pause: **The suspended process** can be restored and run directly 
from the suspended node
+  * Delete: Delete process instances and task instances under process instances
+  * Gantt diagram: The vertical axis of Gantt diagram is the topological 
ordering of task instances under a process instance, and the horizontal axis is 
the running time of task instances, as shown in the figure:
+<p align="center">
+      <img src="/img/gantt-en.png" width="80%" />
+</p>
+
+### View task instances
+  > Click on "Task Instance" to enter the Task List page and query the 
performance of the task.
+  >
+  >
+
+<p align="center">
+   <img src="/img/task-instances-en.png" width="80%" />
+</p>
+
+  > Click "View Log" in the action column to view the log of task execution.
+
+<p align="center">
+   <img src="/img/task-execution-en.png" width="80%" />
+</p>
+
+### Create data source
+  > Data Source Center supports MySQL, POSTGRESQL, HIVE and Spark data sources.
+
+#### Create and edit MySQL data source
+
+  - Click on "Datasource - > Create Datasources" to create different types of 
datasources according to requirements.
+- Datasource: Select MYSQL
+- Datasource Name: Name of Input Datasource
+- Description: Description of input datasources
+- IP: Enter the IP to connect to MySQL
+- Port: Enter the port to connect MySQL
+- User name: Set the username to connect to MySQL
+- Password: Set the password to connect to MySQL
+- Database name: Enter the name of the database connecting MySQL
+- Jdbc connection parameters: parameter settings for MySQL connections, filled 
in as JSON
+
+<p align="center">
+   <img src="/img/mysql-en.png" width="80%" />
+ </p>
+
+  > Click "Test Connect" to test whether the data source can be successfully 
connected.
+  >
+  >
+
+#### Create and edit POSTGRESQL data source
+
+- Datasource: Select POSTGRESQL
+- Datasource Name: Name of Input Data Source
+- Description: Description of input data sources
+- IP: Enter IP to connect to POSTGRESQL
+- Port: Input port to connect POSTGRESQL
+- Username: Set the username to connect to POSTGRESQL
+- Password: Set the password to connect to POSTGRESQL
+- Database name: Enter the name of the database connecting to POSTGRESQL
+- Jdbc connection parameters: parameter settings for POSTGRESQL connections, 
filled in as JSON
+
+<p align="center">
+   <img src="/img/create-datasource-en.png" width="80%" />
+ </p>
+
+#### Create and edit HIVE data source
+
+1.Connect with HiveServer 2
+
+ <p align="center">
+    <img src="/img/hive-en.png" width="80%" />
+  </p>
+
+  - Datasource: Select HIVE
+- Datasource Name: Name of Input Datasource
+- Description: Description of input datasources
+- IP: Enter IP to connect to HIVE
+- Port: Input port to connect to HIVE
+- Username: Set the username to connect to HIVE
+- Password: Set the password to connect to HIVE
+- Database Name: Enter the name of the database connecting to HIVE
+- Jdbc connection parameters: parameter settings for HIVE connections, filled 
in in as JSON
+
+2.Connect using Hive Server 2 HA Zookeeper mode
+
+ <p align="center">
+    <img src="/img/zookeeper-en.png" width="80%" />
+  </p>
+
+
+Note: If **kerberos** is turned on, you need to fill in **Principal**
+<p align="center">
+    <img src="/img/principal-en.png" width="80%" />
+  </p>
+
+
+
+
+#### Create and Edit Spark Datasource
+
+<p align="center">
+   <img src="/img/edit-datasource-en.png" width="80%" />
+ </p>
+
+- Datasource: Select Spark
+- Datasource Name: Name of Input Datasource
+- Description: Description of input datasources
+- IP: Enter the IP to connect to Spark
+- Port: Input port to connect Spark
+- Username: Set the username to connect to Spark
+- Password: Set the password to connect to Spark
+- Database name: Enter the name of the database connecting to Spark
+- Jdbc Connection Parameters: Parameter settings for Spark Connections, filled 
in as JSON
+
+
+
+Note: If **kerberos** If Kerberos is turned on, you need to fill in  
**Principal**
+
+<p align="center">
+    <img src="/img/kerberos-en.png" width="80%" />
+  </p>
+
+### Upload Resources
+  - Upload resource files and udf functions, all uploaded files and resources 
will be stored on hdfs, so the following configuration items are required:
+
+```
+conf/common/common.properties  
+    # Users who have permission to create directories under the HDFS root path
+    hdfs.root.user=hdfs
+    # data base dir, resource file will store to this hadoop hdfs path, self 
configuration, please make sure the directory exists on hdfs and have read 
write permissions。"/escheduler" is recommended
+    data.store2hdfs.basepath=/dolphinscheduler
+    # resource upload startup type : HDFS,S3,NONE
+    res.upload.startup.type=HDFS
+    # whether kerberos starts
+    hadoop.security.authentication.startup.state=false
+    # java.security.krb5.conf path
+    java.security.krb5.conf.path=/opt/krb5.conf
+    # loginUserFromKeytab user
+    [email protected]
+    # loginUserFromKeytab path
+    login.user.keytab.path=/opt/hdfs.headless.keytab
+    
+conf/common/hadoop.properties      
+    # ha or single namenode,If namenode ha needs to copy core-site.xml and 
hdfs-site.xml
+    # to the conf directory，support s3，for example : s3a://dolphinscheduler
+    fs.defaultFS=hdfs://mycluster:8020    
+    #resourcemanager ha note this need ips , this empty if single
+    yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx    
+    # If it is a single resourcemanager, you only need to configure one host 
name. If it is resourcemanager HA, the default configuration is fine
+    yarn.application.status.address=http://xxxx:8088/ws/v1/cluster/apps/%s
+
+```
+- yarn.resourcemanager.ha.rm.ids and yarn.application.status.address only need 
to configure one address, and the other address is empty.
+- You need to copy core-site.xml and hdfs-site.xml from the conf directory of 
the Hadoop cluster to the conf directory of the dolphinscheduler project and 
restart the api-server service.
+
+#### File Manage
+
+  > It is the management of various resource files, including creating basic 
txt/log/sh/conf files, uploading jar packages and other types of files, 
editing, downloading, deleting and other operations.
+  >
+  >
+  > <p align="center">
+  >  <img src="/img/file-manage-en.png" width="80%" />
+  > </p>
+
+  * Create file
+ > File formats support the following 
types：txt、log、sh、conf、cfg、py、java、sql、xml、hql
+
+<p align="center">
+   <img src="/img/create-file.png" width="80%" />
+ </p>
+
+  * Upload Files
+
+> Upload Files: Click the Upload button to upload, drag the file to the upload 
area, and the file name will automatically complete the uploaded file name.
+
+<p align="center">
+   <img src="/img/file-upload-en.png" width="80%" />
+ </p>
+
+
+  * File View
+
+> For viewable file types, click on the file name to view file details
+
+<p align="center">
+   <img src="/img/file-view-en.png" width="80%" />
+ </p>
+
+  * Download files
+
+> You can download a file by clicking the download button in the top right 
corner of the file details, or by downloading the file under the download 
button after the file list.
+
+  * File rename
+
+<p align="center">
+   <img src="/img/rename-en.png" width="80%" />
+ </p>
+
+#### Delete
+>  File List - > Click the Delete button to delete the specified file
+
+#### Resource management
+  > Resource management and file management functions are similar. The 
difference is that resource management is the UDF function of uploading, and 
file management uploads user programs, scripts and configuration files.
+
+  * Upload UDF resources
+  > The same as uploading files.
+
+#### Function management
+
+  * Create UDF Functions
+  > Click "Create UDF Function", enter parameters of udf function, select UDF 
resources, and click "Submit" to create udf function.
+  >
+  >
+  >
+  > Currently only temporary udf functions for HIVE are supported
+  >
+  > 
+  >
+  > - UDF function name: name when entering UDF Function
+  > - Package Name: Full Path of Input UDF Function
+  > - Parameter: Input parameters used to annotate functions
+  > - Database Name: Reserved Field for Creating Permanent UDF Functions
+  > - UDF Resources: Set up the resource files corresponding to the created UDF
+  >
+  > 
+
+<p align="center">
+   <img src="/img/udf-function.png" width="80%" />
+ </p>
+
+## Security
+
+  - The security has the functions of queue management, tenant management, 
user management, warning group management, worker group manager, token manage 
and other functions. It can also authorize resources, data sources, projects, 
etc.
+- Administrator login, default username password: admin/dolphinscheduler123
+
+
+
+### Create queues
+
+
+
+  - Queues are used to execute spark, mapreduce and other programs, which 
require the use of "queue" parameters.
+- "Security" - > "Queue Manage" - > "Create Queue" 
+     <p align="center">
+    <img src="/img/create-queue-en.png" width="80%" />
+  </p>
+
+
+### Create Tenants
+  - The tenant corresponds to the account of Linux, which is used by the 
worker server to submit jobs. If Linux does not have this user, the worker 
would create the account when executing the task.
+  - Tenant Code：**the tenant code is the only account on Linux that can't be 
duplicated.**
+
+ <p align="center">
+    <img src="/img/create-tenant-en.png" width="80%" />
+  </p>
+
+### Create Ordinary Users
+  -  User types are **ordinary users** and **administrator users**..
+    * Administrators have **authorization and user management** privileges, 
and no privileges to **create project and process-defined operations**.
+    * Ordinary users can **create projects and create, edit, and execute 
process definitions**.
+    * Note: **If the user switches the tenant, all resources under the tenant 
will be copied to the switched new tenant.**
+<p align="center">
+      <img src="/img/create-user-en.png" width="80%" />
+ </p>
+
+### Create alarm group
+  * The alarm group is a parameter set at start-up. After the process is 
finished, the status of the process and other information will be sent to the 
alarm group by mail.
+  * New and Editorial Warning Group
+    <p align="center">
+    <img src="/img/alarm-group-en.png" width="80%" />
+    </p>
+
+### Create Worker Group
+  - Worker group provides a mechanism for tasks to run on a specified worker. 
Administrators create worker groups, which can be specified in task nodes and 
operation parameters. If the specified grouping is deleted or no grouping is 
specified, the task will run on any worker.
+- Multiple IP addresses within a worker group (**aliases can not be 
written**), separated by **commas in English**
+
+  <p align="center">
+    <img src="/img/worker-group-en.png" width="80%" />
+  </p>
+
+### Token manage
+  - Because the back-end interface has login check and token management, it 
provides a way to operate the system by calling the interface.
+    <p align="center">
+      <img src="/img/token-en.png" width="80%" />
+    </p>
+- Call examples:
+
+```令牌调用示例
+    /**
+     * test token
+     */
+    public  void doPOSTParam()throws Exception{
+        // create HttpClient
+        CloseableHttpClient httpclient = HttpClients.createDefault();
+
+        // create http post request
+        HttpPost httpPost = new 
HttpPost("http://127.0.0.1:12345/dolphinscheduler/projects/create";);
+        httpPost.setHeader("token", "123");
+        // set parameters
+        List<NameValuePair> parameters = new ArrayList<NameValuePair>();
+        parameters.add(new BasicNameValuePair("projectName", "qzw"));
+        parameters.add(new BasicNameValuePair("desc", "qzw"));
+        UrlEncodedFormEntity formEntity = new UrlEncodedFormEntity(parameters);
+        httpPost.setEntity(formEntity);
+        CloseableHttpResponse response = null;
+        try {
+            // execute
+            response = httpclient.execute(httpPost);
+            // response status code 200
+            if (response.getStatusLine().getStatusCode() == 200) {
+                String content = EntityUtils.toString(response.getEntity(), 
"UTF-8");
+                System.out.println(content);
+            }
+        } finally {
+            if (response != null) {
+                response.close();
+            }
+            httpclient.close();
+        }
+    }
+```
+
+### Grant authority
+  - Granting permissions includes project permissions, resource permissions, 
datasource permissions, UDF Function permissions.
+> Administrators can authorize projects, resources, data sources and UDF 
Functions that are not created by ordinary users. Because project, resource, 
data source and UDF Function are all authorized in the same way, the project 
authorization is introduced as an example.
+
+> Note：For projects created by the user himself, the user has all the 
permissions. The list of items and the list of selected items will not be 
reflected
+
+  - 1.Click on the authorization button of the designated person as follows:
+    <p align="center">
+      <img src="/img/operation-en.png" width="80%" />
+ </p>
+
+- 2.Select the project button to authorize the project
+
+<p align="center">
+   <img src="/img/auth-project-en.png" width="80%" />
+ </p>
+
+### Monitor center
+  - Service management is mainly to monitor and display the health status and 
basic information of each service in the system.
+
+#### Master monitor
+  - Mainly related information about master.
+<p align="center">
+      <img src="/img/master-monitor-en.png" width="80%" />
+ </p>
+
+#### Worker monitor
+  - Mainly related information of worker.
+
+<p align="center">
+   <img src="/img/worker-monitor-en.png" width="80%" />
+ </p>
+
+#### Zookeeper monitor
+  - Mainly the configuration information of each worker and master in 
zookpeeper.
+
+<p align="center">
+   <img src="/img/zookeeper-monitor-en.png" width="80%" />
+ </p>
+
+#### DB monitor
+  - Mainly the health status of DB
+
+<p align="center">
+   <img src="/img/db-monitor-en.png" width="80%" />
+ </p>
+ 
+#### statistics Manage
+ <p align="center">
+   <img src="/img/statistics-en.png" width="80%" />
+ </p>
+  
+  -  Commands to be executed: statistics on t_ds_command table
+  -  Number of commands that failed to execute: statistics on the 
t_ds_error_command table
+  -  Number of tasks to run: statistics of task_queue data in zookeeper
+  -  Number of tasks to be killed: statistics of task_kill in zookeeper
+
+## <span id=TaskNodeType>Task Node Type and Parameter Setting</span>
+
+### Shell
+
+  - The shell node, when the worker executes, generates a temporary shell 
script, which is executed by a Linux user with the same name as the tenant.
+> Drag the 
![PNG](https://analysys.github.io/easyscheduler_docs/images/toolbar_SHELL.png) 
task node in the toolbar onto the palette and double-click the task node as 
follows:
+
+<p align="center">
+   <img src="/img/shell-en.png" width="80%" />
+ </p>`
+
+- Node name: The node name in a process definition is unique
+- Run flag: Identify whether the node can be scheduled properly, and if it 
does not need to be executed, you can turn on the forbidden execution switch.
+- Description : Describes the function of the node
+- Number of failed retries: Number of failed task submissions, support 
drop-down and manual filling
+- Failure Retry Interval: Interval between tasks that fail to resubmit tasks, 
support drop-down and manual filling
+- Script: User-developed SHELL program
+- Resources: A list of resource files that need to be invoked in a script
+- Custom parameters: User-defined parameters that are part of SHELL replace 
the contents of scripts with ${variables}
+
+### SUB_PROCESS
+  - The sub-process node is to execute an external workflow definition as an 
task node.
+> Drag the 
![PNG](https://analysys.github.io/easyscheduler_docs_cn/images/toolbar_SUB_PROCESS.png)
 task node in the toolbar onto the palette and double-click the task node as 
follows:
+
+<p align="center">
+   <img src="/img/sub-process-en.png" width="80%" />
+ </p>
+
+- Node name: The node name in a process definition is unique
+- Run flag: Identify whether the node is scheduled properly
+- Description: Describes the function of the node
+- Sub-node: The process definition of the selected sub-process is selected, 
and the process definition of the selected sub-process can be jumped to by 
entering the sub-node in the upper right corner.
+
+### DEPENDENT
+
+  - Dependent nodes are **dependent checking nodes**. For example, process A 
depends on the successful execution of process B yesterday, and the dependent 
node checks whether process B has a successful execution instance yesterday.
+
+> Drag the 
![PNG](https://analysys.github.io/easyscheduler_docs/images/toolbar_DEPENDENT.png)
 ask node in the toolbar onto the palette and double-click the task node as 
follows:
+
+<p align="center">
+   <img src="/img/current-node-en.png" width="80%" />
+ </p>
+
+  > Dependent nodes provide logical judgment functions, such as checking 
whether yesterday's B process was successful or whether the C process was 
successfully executed.
+
+  <p align="center">
+   <img src="/img/weekly-A-en.png" width="80%" />
+ </p>
+
+  > For example, process A is a weekly task and process B and C are daily 
tasks. Task A requires that task B and C be successfully executed every day of 
the last week, as shown in the figure:
+
+ <p align="center">
+   <img src="/img/weekly-A1-en.png" width="80%" />
+ </p>
+
+  > If weekly A also needs to be implemented successfully on Tuesday:
+
+ <p align="center">
+   <img src="/img/weekly-A2-en.png" width="80%" />
+ </p>
+
+###  PROCEDURE
+  - The procedure is executed according to the selected data source.
+> Drag the 
![PNG](https://analysys.github.io/easyscheduler_docs/images/toolbar_PROCEDURE.png)
 task node in the toolbar onto the palette and double-click the task node as 
follows:
+
+<p align="center">
+   <img src="/img/node-setting-en.png" width="80%" />
+ </p>
+
+- Datasource: The data source type of stored procedure supports MySQL and 
POSTGRESQL, and chooses the corresponding data source.
+- Method: The method name of the stored procedure
+- Custom parameters: Custom parameter types of stored procedures support IN 
and OUT, and data types support nine data types: VARCHAR, INTEGER, LONG, FLOAT, 
DOUBLE, DATE, TIME, TIMESTAMP and BOOLEAN.
+
+### SQL
+  - Drag the 
![PNG](https://analysys.github.io/easyscheduler_docs/images/toolbar_SQL.png) 
task node in the toolbar onto the palette.
+  - Execute non-query SQL functionality
+    <p align="center">
+      <img src="/img/dependent-nodes-en.png" width="80%" />
+ </p>
+
+  - Executing the query SQL function, you can choose to send mail in the form 
of tables and attachments to the designated recipients.
+
+<p align="center">
+   <img src="/img/double-click-en.png" width="80%" />
+ </p>
+
+- Datasource: Select the corresponding datasource
+- sql type: support query and non-query, query is select type query, there is 
a result set returned, you can specify mail notification as table, attachment 
or table attachment three templates. Non-query is not returned by result set, 
and is for update, delete, insert three types of operations
+- sql parameter: input parameter format is key1 = value1; key2 = value2...
+- sql statement: SQL statement
+- UDF function: For HIVE type data sources, you can refer to UDF functions 
created in the resource center, other types of data sources do not support UDF 
functions for the time being.
+- Custom parameters: SQL task type, and stored procedure is to customize the 
order of parameters to set values for methods. Custom parameter type and data 
type are the same as stored procedure task type. The difference is that the 
custom parameter of the SQL task type replaces the ${variable} in the SQL 
statement.
+- Pre Statement: Pre-sql is executed before the sql statement
+- Post Statement: Post-sql is executed after the sql statement
+
+
+
+### SPARK 
+
+  - Through SPARK node, SPARK program can be directly executed. For spark 
node, worker will use `spark-submit` mode to submit tasks.
+
+> Drag the   
![PNG](https://analysys.github.io/easyscheduler_docs/images/toolbar_SPARK.png)  
task node in the toolbar onto the palette and double-click the task node as 
follows:
+>
+> 
+
+<p align="center">
+   <img src="/img/spark-submit-en.png" width="80%" />
+ </p>
+
+- Program Type: Support JAVA, Scala and Python
+- Class of the main function: The full path of Main Class, the entry to the 
Spark program
+- Master jar package: It's Spark's jar package
+- Deployment: support three modes: yarn-cluster, yarn-client, and local
+- Driver Kernel Number: Driver Kernel Number and Memory Number can be set
+- Executor Number: Executor Number, Executor Memory Number and Executor Kernel 
Number can be set
+- Command Line Parameters: Setting the input parameters of Spark program to 
support the replacement of custom parameter variables.
+- Other parameters: support - jars, - files, - archives, - conf format
+- Resource: If a resource file is referenced in other parameters, you need to 
select the specified resource.
+- Custom parameters: User-defined parameters in MR locality that replace the 
contents in scripts with ${variables}
+
+Note: JAVA and Scala are just used for identification, no difference. If it's 
a Spark developed by Python, there's no class of the main function, and 
everything else is the same.
+
+### MapReduce(MR)
+  - Using MR nodes, MR programs can be executed directly. For Mr nodes, worker 
submits tasks using `hadoop jar`
+
+
+> Drag the 
![PNG](https://analysys.github.io/easyscheduler_docs/images/toolbar_MR.png) 
task node in the toolbar onto the palette and double-click the task node as 
follows:
+
+ 1. JAVA program
+
+ <p align="center">
+    <img src="/img/java-program-en.png" width="80%" />
+  </p>
+
+- Class of the main function: The full path of the MR program's entry Main 
Class
+- Program Type: Select JAVA Language
+- Master jar package: MR jar package
+- Command Line Parameters: Setting the input parameters of MR program to 
support the replacement of custom parameter variables
+- Other parameters: support - D, - files, - libjars, - archives format
+- Resource: If a resource file is referenced in other parameters, you need to 
select the specified resource.
+- Custom parameters: User-defined parameters in MR locality that replace the 
contents in scripts with ${variables}
+
+2. Python program
+
+<p align="center">
+   <img src="/img/python-program-en.png" width="80%" />
+ </p>
+
+- Program Type: Select Python Language
+- Main jar package: Python jar package running MR
+- Other parameters: support - D, - mapper, - reducer, - input - output format, 
where user-defined parameters can be set, such as:
+- mapper "mapper.py 1" - file mapper.py-reducer reducer.py-file 
reducer.py-input/journey/words.txt-output/journey/out/mr/${current TimeMillis}
+- Among them, mapper. py 1 after - mapper is two parameters, the first 
parameter is mapper. py, and the second parameter is 1.
+- Resource: If a resource file is referenced in other parameters, you need to 
select the specified resource.
+- Custom parameters: User-defined parameters in MR locality that replace the 
contents in scripts with ${variables}
+
+### Python
+  - With Python nodes, Python scripts can be executed directly. For Python 
nodes, worker will use `python ** `to submit tasks.
+
+
+
+
+> Drag the 
![PNG](https://analysys.github.io/easyscheduler_docs/images/toolbar_PYTHON.png) 
task node in the toolbar onto the palette and double-click the task node as 
follows:
+
+<p align="center">
+   <img src="/img/python-en.png" width="80%" />
+ </p>
+
+- Script: User-developed Python program
+- Resource: A list of resource files that need to be invoked in a script
+- Custom parameters: User-defined parameters that are part of Python that 
replace the contents in the script with ${variables}
+
+### System parameter
+
+<table>
+    <tr><th>variable</th><th>meaning</th></tr>
+    <tr>
+        <td>${system.biz.date}</td>
+        <td>The timing time of routine dispatching instance is one day before, 
in yyyyyMMdd format. When data is supplemented, the date + 1</td>
+    </tr>
+    <tr>
+        <td>${system.biz.curdate}</td>
+        <td> Daily scheduling example timing time, format is yyyyyMMdd, when 
supplementing data, the date + 1</td>
+    </tr>
+    <tr>
+        <td>${system.datetime}</td>
+        <td>Daily scheduling example timing time, format is yyyyyMMddHmmss, 
when supplementing data, the date + 1</td>
+    </tr>
+</table>
+
+
+### Time Customization Parameters
+
+ -  Support code to customize the variable name, declaration: ${variable 
name}. It can refer to "system parameters" or specify "constants".
+
+ -  When we define this benchmark variable as $[...]， [yyyyMMddHHmmss] can be 
decomposed and combined arbitrarily, such as:$[yyyyMMdd], $[HHmmss], 
$[yyyy-MM-dd] ,etc.
+
+ -  Can also do this：
+ 
+
+
+    *  Later N years: $[add_months (yyyyyyMMdd, 12*N)]
+    *  The previous N years: $[add_months (yyyyyyMMdd, -12*N)]
+    *  Later N months: $[add_months (yyyyyMMdd, N)]
+    *  The first N months: $[add_months (yyyyyyMMdd, -N)]
+    *  Later N weeks: $[yyyyyyMMdd + 7*N]
+    *  The first N weeks: $[yyyyyMMdd-7*N]
+    *  The day after that: $[yyyyyyMMdd + N]
+    *  The day before yesterday: $[yyyyyMMdd-N]
+    *  Later N hours: $[HHmmss + N/24]
+    *  First N hours: $[HHmmss-N/24]
+    *  After N minutes: $[HHmmss + N/24/60]
+    *  First N minutes: $[HHmmss-N/24/60]
+
+
+### <span id=CustomParameters>User-defined parameters</span>
+
+ - User-defined parameters are divided into global parameters and local 
parameters. Global parameters are the global parameters passed when the process 
definition and process instance are saved. Global parameters can be referenced 
by local parameters of any task node in the whole process.
+
+  For example:
+<p align="center">
+   <img src="/img/user-defined-en.png" width="80%" />
+ </p>
+
+ - global_bizdate is a global parameter, referring to system parameters.
+
+<p align="center">
+   <img src="/img/user-defined1-en.png" width="80%" />
+ </p>
+
+ - In tasks, local_param_bizdate refers to global parameters by  
\${global_bizdate} for scripts, the value of variable local_param_bizdate can 
be referenced by \${local_param_bizdate}, or the value of local_param_bizdate 
can be set directly by JDBC.
diff --git a/docs/en-us/1.2.1/user_doc/upgrade.md 
b/docs/en-us/1.2.1/user_doc/upgrade.md
new file mode 100644
index 0000000..7118acc
--- /dev/null
+++ b/docs/en-us/1.2.1/user_doc/upgrade.md
@@ -0,0 +1,39 @@
+
+# DolphinScheduler upgrade documentation
+
+## 1. Back up the previous version of the files and database
+
+## 2. Stop all services of dolphinscheduler
+
+ `sh ./script/stop-all.sh`
+
+## 3. Download the new version of the installation package
+
+- 
[download](https://dolphinscheduler.apache.org/en-us/docs/user_doc/download.html),
 download the latest version of the front and back installation packages 
(backend referred to as dolphinscheduler-backend, front end referred to as 
dolphinscheduler-front)
+- The following upgrade operations need to be performed in the new version of 
the directory
+
+## 4. Database upgrade
+- Modify the following properties in conf/application-dao.properties
+
+```
+    spring.datasource.url
+    spring.datasource.username
+    spring.datasource.password
+```
+
+- Execute database upgrade script
+
+`sh ./script/upgrade-dolphinscheduler.sh`
+
+## 5. Backend service upgrade
+
+- Modify the content of the install.sh configuration and execute the upgrade 
script
+  
+  `sh install.sh`
+
+## 6. Frontend service upgrade
+
+- Overwrite the previous version of the dist directory
+- Restart the nginx service
+  
+    `systemctl restart nginx`
diff --git a/site_config/site.js b/site_config/site.js
index eba850b..ff17e6d 100755
--- a/site_config/site.js
+++ b/site_config/site.js
@@ -18,7 +18,7 @@ export default {
         children: [
           {
             key: 'docs1',
-            text: '1.2.0',
+            text: '1.2.1',
             link: '/en-us/docs/1.2.1/user_doc/quick-start.html',
           },
           {
@@ -123,12 +123,12 @@ export default {
             link: '/zh-cn/docs/1.2.1/user_doc/quick-start.html',
           },
           {
-            key: 'docs1',
+            key: 'docs2',
             text: '1.2.0',
             link: '/zh-cn/docs/1.2.0/user_doc/quick-start.html',
           },
           {
-              key: 'docs2',
+              key: 'docs3',
               text: '1.1.0(Not Apache Release)',
               link: 'https://analysys.github.io/easyscheduler_docs_cn/',
           }

[incubator-dolphinscheduler-website] branch master updated: add en doc of 1.2.1

Reply via email to