[dolphinscheduler-website] branch asf-site updated: Automated deployment: 3a185130f9880217a5a96c3e389c09c1ff563e7a

github-bot Sat, 05 Jun 2021 06:37:21 -0700

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler-website.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 3363474  Automated deployment: 3a185130f9880217a5a96c3e389c09c1ff563e7a
3363474 is described below

commit 3363474d32bb9f29bd3acb048bd2de088a51cec3
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Sat Jun 5 13:37:04 2021 +0000

    Automated deployment: 3a185130f9880217a5a96c3e389c09c1ff563e7a
---
 build/{blog.0d6b9bb.js => blog.4bb08c5.js} |   2 +-
 en-us/blog/Json_Split.html                 | 117 +++++++++++++++++++++++++++++
 en-us/blog/Json_Split.json                 |   6 ++
 en-us/blog/index.html                      |   4 +-
 zh-cn/blog/index.html                      |   2 +-
 zh-cn/blog/json_split.html                 |   4 +-
 zh-cn/blog/json_split.json                 |   2 +-
 7 files changed, 130 insertions(+), 7 deletions(-)

diff --git a/build/blog.0d6b9bb.js b/build/blog.4bb08c5.js
similarity index 73%
rename from build/blog.0d6b9bb.js
rename to build/blog.4bb08c5.js
index 28a819b..54b231d 100644
--- a/build/blog.0d6b9bb.js
+++ b/build/blog.4bb08c5.js
@@ -1 +1 @@
-webpackJsonp([1],{1:function(e,t){e.exports=React},2:function(e,t){e.exports=ReactDOM},401:function(e,t,n){e.exports=n(402)},402:function(e,t,n){"use
 strict";function r(e){return e&&e.__esModule?e:{default:e}}function 
o(e,t){if(!(e instanceof t))throw new TypeError("Cannot call a class as a 
function")}function a(e,t){if(!e)throw new ReferenceError("this hasn't been 
initialised - super() hasn't been called");return!t||"object"!=typeof 
t&&"function"!=typeof t?e:t}function l(e,t){if("functi [...]
\ No newline at end of file
+webpackJsonp([1],{1:function(e,t){e.exports=React},2:function(e,t){e.exports=ReactDOM},401:function(e,t,n){e.exports=n(402)},402:function(e,t,n){"use
 strict";function r(e){return e&&e.__esModule?e:{default:e}}function 
o(e,t){if(!(e instanceof t))throw new TypeError("Cannot call a class as a 
function")}function a(e,t){if(!e)throw new ReferenceError("this hasn't been 
initialised - super() hasn't been called");return!t||"object"!=typeof 
t&&"function"!=typeof t?e:t}function l(e,t){if("functi [...]
\ No newline at end of file
diff --git a/en-us/blog/Json_Split.html b/en-us/blog/Json_Split.html
new file mode 100644
index 0000000..72184ed
--- /dev/null
+++ b/en-us/blog/Json_Split.html
@@ -0,0 +1,117 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0, 
maximum-scale=1.0, user-scalable=no">
+  <meta name="keywords" content="Json_Split">
+  <meta name="description" content="Json_Split">
+  <title>Json_Split</title>
+  <link rel="shortcut icon" href="/img/favicon.ico">
+  <link rel="stylesheet" href="/build/vendor.c5ba65d.css">
+  <link rel="stylesheet" href="/build/blog.md.fd8b187.css">
+</head>
+<body>
+  <div id="root"><div class="blog-detail-page" data-reactroot=""><header 
class="header-container header-container-dark"><div class="header-body"><a 
href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div 
class="search search-dark"><span class="icon-search"></span></div><span 
class="language-switch language-switch-dark">中</span><div 
class="header-menu"><img class="header-menu-toggle" 
src="/img/system/menu_white.png"/><div><ul class="ant-menu whiteClass 
ant-menu-lig [...]
+<h3>The Background</h3>
+<p>Currently DolphinScheduler saves tasks and relationships in process as big 
json to the process_definition_json field in the process_definiton table in the 
database. If a process is large, for example, with 1000 tasks, the json field 
becomes very large and needs to be parsed when using the json, which is very 
performance intensive and the tasks cannot be reused, so the community plans to 
start a json splitting project. Encouragingly, we have now completed most of 
this work, so a summar [...]
+<h3>Summarization</h3>
+<p>The json split project was started on 2021-01-12 and the main development 
was initially completed by 2021-04-25. The code has been merged into the dev 
branch. Thanks to lenboo, JinyLeeChina, simon824 and wen-hemin for coding.</p>
+<p>The main changes, as well as the contributions, are as follows:</p>
+<ul>
+<li>Code changes 12793 lines</li>
+<li>168 files modified/added</li>
+<li>145 Commits in total</li>
+<li>There were 85 PRs</li>
+</ul>
+<h3>拆分方案回顾</h3>
+<p><img 
src="https://user-images.githubusercontent.com/42576980/117598604-b1ad8e80-b17a-11eb-9d99-d593fce7bab6.png";
 alt="拆分方案"></p>
+<ul>
+<li>[ ] When the api module performs a save operation</li>
+</ul>
+<ol>
+<li>The process definition is saved to process_definition (main table) and 
process_definition_log (log table), both tables hold the same data and the 
process definition version is 1</li>
+<li>The task definition table is saved to task_definition (main table) and 
task_definition_log (log table), also saving the same data, with task 
definition version 1</li>
+<li>process task relationships are stored in the process_task_relation (main 
table) and process_task_relation_log (log table), which holds the code and 
version of the process, as tasks are organised through the process and the dag 
is drawn in terms of the process. The current node of the dag is also known by 
its post_task_code and post_task_version, the predecessor dependency of this 
node is identified by pre_task_code and pre_task_version, if there is no 
dependency, the pre_task_code an [...]
+</ol>
+<ul>
+<li>[ ] When the api module performs an update operation, the process 
definition and task definition update the main table data directly, and the 
updated data is inserted into the log table. The main table is deleted and then 
inserted into the new relationship, and the log table is inserted directly into 
the new relationship.</li>
+<li>[ ] When the api module performs a delete operation, the process 
definition, task definition and relationship table are deleted directly from 
the master table, leaving the log table data unchanged.</li>
+<li>[ ] When the api module performs a switch operation, the corresponding 
version data in the log table is overwritten directly into the main table.</li>
+</ul>
+<h3>Json Access Solutions</h3>
+<p><img 
src="https://user-images.githubusercontent.com/42576980/117598643-c9851280-b17a-11eb-9a6e-c81ee083b09c.png";
 alt="json"></p>
+<ul>
+<li>
+<p>[ ] In the current phase of the splitting scheme, the api module controller 
layer remains unchanged and the incoming big json is still mapped to 
ProcessData objects in the service layer. insert or update operations are done 
in the public Service module through the ProcessService. saveProcessDefiniton() 
entry in the public Service module, which saves the database operations in the 
order of task_definition, process_task_relation, process_definition. When 
saving, the task is changed if i [...]
+</li>
+<li>
+<p>[ ] The data is assembled in the public Service module through the 
ProcessService.genTaskNodeList() entry, or assembled into a ProcessData object, 
which in turn generates a json to return</p>
+</li>
+<li>
+<p>[ ] The Server module (Master) also gets the TaskNodeList through the 
public Service module ProcessService.genTaskNodeList() to generate the dispatch 
dag, which puts all the information about the current task into the 
MasterExecThread. readyToSubmitTaskQueue queue in order to generate 
taskInstance, dispatch to worker</p>
+</li>
+</ul>
+<h2>Phase 2 Planning</h2>
+<h3>API / UI module transformation</h3>
+<ul>
+<li>[ ] The processDefinition interface requests a back-end replacement for 
processDefinitonCode via processDefinitionId</li>
+<li>[ ] Support for separate definition of task, the current task is inserted 
and modified through the workflow, Phase 2 needs to support separate 
definition</li>
+<li>[ ] Frontend and backend controller layer json splitting, Phase 1 has 
completed the api module service layer to dao json splitting, Phase 2 needs to 
complete the front-end and controller layer json splitting</li>
+</ul>
+<h3>server module retrofit</h3>
+<ul>
+<li>[ ] Replace process_definition_id with process_definition_code in 
t_ds_command and t_ds_error_command、t_ds_schedules</li>
+<li>[ ] Generating a taskInstance process transformation</li>
+</ul>
+<p>The current process_instance is generated from the process_definition and 
schedules and command tables, while the taskInstance is generated from the 
MasterExecThread. readyToSubmitTaskQueue queue, and the data in the queue comes 
from the dag object. At this point, the queue and dag hold all the information 
about the taskInstance, which is very memory intensive. It can be modified to 
the following data flow, where the readyToSubmitTaskQueue queue and dag hold 
the task code and version  [...]
+<p><img 
src="https://user-images.githubusercontent.com/42576980/117598659-d3a71100-b17a-11eb-8fe1-8725299510e6.png";
 alt="server"></p>
+<hr>
+<p><strong>Appendix: The snowflake algorithm</strong></p>
+<p><strong>snowflake:</strong> is an algorithm for generating distributed, 
drama-wide unique IDs called <strong>snowflake</strong>, which was created by 
Twitter and used for tweeting IDs.</p>
+<p>A Snowflake ID has 64 bits. the first 41 bits are timestamps, representing 
the number of milliseconds since the selected period. The next 10 bits 
represent the computer ID to prevent conflicts. The remaining 12 bits represent 
the serial number of the generated ID on each machine, which allows multiple 
Snowflake IDs to be created in the same millisecond. snowflakeIDs are generated 
based on time and can therefore be ordered by time. In addition, the generation 
time of an ID can be infer [...]
+<ol>
+<li>
+<p><strong>Structure of the snowflake algorithm:</strong></p>
+<p><img 
src="https://github.com/apache/dolphinscheduler-website/blob/master/img/JsonSplit/snowflake.png?raw=true";
 alt="snowflake"></p>
+<p>It is divided into 5 main parts.</p>
+<ol>
+<li>is 1 bit: 0, this is meaningless.</li>
+<li>is 41 bits: this represents the timestamp</li>
+<li>is 10 bits: the room id, 0000000000, as 0 is passed in at this point.</li>
+<li>is 12 bits: the serial number, which is the serial number of the ids 
generated at the same time during the millisecond on a machine in a certain 
room, 0000 0000 0000.</li>
+</ol>
+<p>Next we will explain the four parts:</p>
+</li>
+</ol>
+<p><strong>1 bit, which is meaningless:</strong></p>
+<p>Because the first bit in binary is a negative number if it is 1, but the 
ids we generate are all positive, so the first bit is always 0.</p>
+<p><strong>41 bit: This is a timestamp in milliseconds.</strong></p>
+<p>41 bit can represent as many numbers as 2^41 - 1, i.e. it can identify 2 ^ 
41 - 1 milliseconds, which translates into 69 years of time.</p>
+<p><strong>10 bit: Record the work machine ID, which represents this service 
up to 2 ^ 10 machines, which is 1024 machines.</strong></p>
+<p>But in 10 bits 5 bits represent the machine room id and 5 bits represent 
the machine id, which means up to 2 ^ 5 machine rooms (32 machine rooms), each 
of which can represent 2 ^ 5 machines (32 machines), which can be split up as 
you wish, for example by taking out 4 bits to identify the service number and 
the other 6 bits as the machine number. This can be combined in any way you 
like.</p>
+<p><strong>12 bit: This is used to record the different ids generated in the 
same millisecond.</strong></p>
+<p>12 bit can represent the maximum integer of 2 ^ 12 - 1 = 4096, that is, can 
be distinguished from 4096 different IDs in the same milliseconds with the 
numbers of the 12 BIT representative. That is, the maximum number of IDs 
generated by the same machine in the same milliseconds is 4096</p>
+<p>In simple terms, if you have a service that wants to generate a globally 
unique id, you can send a request to a system that has deployed the SnowFlake 
algorithm to generate the unique id. The SnowFlake algorithm then receives the 
request and first generates a 64 bit long id using binary bit manipulation, the 
first bit of the 64 bits being meaningless.  This is followed by 41 bits of the 
current timestamp (in milliseconds), then 10 bits to set the machine id, and 
finally the last 12 bi [...]
+<p>The characteristics of SnowFlake are:</p>
+<ol>
+<li>the number of milliseconds is at the high end, the self-incrementing 
sequence is at the low end, and the entire ID is trended incrementally.</li>
+<li>it does not rely on third-party systems such as databases, and is deployed 
as a service for greater stability and performance in generating IDs.</li>
+<li>the bit can be allocated according to your business characteristics, very 
flexible.</li>
+</ol>
+</section><footer class="footer-container"><div 
class="footer-body"><div><h3>About us</h3><h4>Do you need feedback? Please 
contact us through the following ways.</h4></div><div 
class="contact-container"><ul><li><img class="img-base" 
src="/img/emailgray.png"/><img class="img-change" src="/img/emailblue.png"/><a 
href="/en-us/community/development/subscribe.html"><p>Email 
List</p></a></li><li><img class="img-base" src="/img/twittergray.png"/><img 
class="img-change" src="/img/twitterblue.png [...]
+  <script 
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-with-addons.min.js"></script>
+  <script 
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-dom.min.js"></script>
+  <script>window.rootPath = '';</script>
+  <script src="/build/vendor.d44685f.js"></script>
+  <script src="/build/blog.md.57874be.js"></script>
+  <script>
+    var _hmt = _hmt || [];
+    (function() {
+      var hm = document.createElement("script");
+      hm.src = "https://hm.baidu.com/hm.js?4e7b4b400dd31fa015018a435c64d06f";;
+      var s = document.getElementsByTagName("script")[0];
+      s.parentNode.insertBefore(hm, s);
+    })();
+  </script>
+</body>
+</html>
\ No newline at end of file
diff --git a/en-us/blog/Json_Split.json b/en-us/blog/Json_Split.json
new file mode 100644
index 0000000..8c67050
--- /dev/null
+++ b/en-us/blog/Json_Split.json
@@ -0,0 +1,6 @@
+{
+  "filename": "Json_Split.md",
+  "__html": "<h2>Why did we split the big json that holds the tasks and 
relationships in the DolphinScheduler workflow definition?</h2>\n<h3>The 
Background</h3>\n<p>Currently DolphinScheduler saves tasks and relationships in 
process as big json to the process_definition_json field in the 
process_definiton table in the database. If a process is large, for example, 
with 1000 tasks, the json field becomes very large and needs to be parsed when 
using the json, which is very performance inten [...]
+  "link": "/dist/en-us/blog/Json_Split.html",
+  "meta": {}
+}
\ No newline at end of file
diff --git a/en-us/blog/index.html b/en-us/blog/index.html
index 18b62c5..bcd2ac3 100644
--- a/en-us/blog/index.html
+++ b/en-us/blog/index.html
@@ -11,12 +11,12 @@
   <link rel="stylesheet" href="/build/blog.acc2955.css">
 </head>
 <body>
-  <div id="root"><div class="blog-list-page" data-reactroot=""><header 
class="header-container header-container-dark"><div class="header-body"><a 
href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div 
class="search search-dark"><span class="icon-search"></span></div><span 
class="language-switch language-switch-dark">中</span><div 
class="header-menu"><img class="header-menu-toggle" 
src="/img/system/menu_white.png"/><div><ul class="ant-menu whiteClass 
ant-menu-light [...]
+  <div id="root"><div class="blog-list-page" data-reactroot=""><header 
class="header-container header-container-dark"><div class="header-body"><a 
href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div 
class="search search-dark"><span class="icon-search"></span></div><span 
class="language-switch language-switch-dark">中</span><div 
class="header-menu"><img class="header-menu-toggle" 
src="/img/system/menu_white.png"/><div><ul class="ant-menu whiteClass 
ant-menu-light [...]
   <script 
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-with-addons.min.js"></script>
   <script 
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-dom.min.js"></script>
   <script>window.rootPath = '';</script>
   <script src="/build/vendor.d44685f.js"></script>
-  <script src="/build/blog.0d6b9bb.js"></script>
+  <script src="/build/blog.4bb08c5.js"></script>
   <script>
     var _hmt = _hmt || [];
     (function() {
diff --git a/zh-cn/blog/index.html b/zh-cn/blog/index.html
index b2d21ed..d9cd477 100644
--- a/zh-cn/blog/index.html
+++ b/zh-cn/blog/index.html
@@ -16,7 +16,7 @@
   <script 
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-dom.min.js"></script>
   <script>window.rootPath = '';</script>
   <script src="/build/vendor.d44685f.js"></script>
-  <script src="/build/blog.0d6b9bb.js"></script>
+  <script src="/build/blog.4bb08c5.js"></script>
   <script>
     var _hmt = _hmt || [];
     (function() {
diff --git a/zh-cn/blog/json_split.html b/zh-cn/blog/json_split.html
index 75d6048..f38e055 100644
--- a/zh-cn/blog/json_split.html
+++ b/zh-cn/blog/json_split.html
@@ -80,7 +80,7 @@
 <p><strong>雪花算法（snowflake）：</strong> 是一种生成分布式全剧唯一 ID 的算法，生成的 ID 称为 
<strong>snowflake</strong>，这种算法是由 Twitter 创建，并用于推文的 ID。</p>
 <p>一个 Snowflake ID 有 64 bit。前 41 bit 是时间戳，表示了自选定的时期以来的毫秒数。 接下来的 10 bit 代表计算机 
ID，防止冲突。 其余 12 bit 代表每台机器上生成 ID 的序列号，这允许在同一毫秒内创建多个 Snowflake ID。SnowflakeID 
基于时间生成，故可以按时间排序。此外，一个 ID 的生成时间可以由其自身推断出来，反之亦然。该特性可以用于按时间筛选 ID，以及与之联系的对象。</p>
 <p><strong>雪花算法的结构：</strong></p>
-<p><img 
src="https://github.com/QuakeWang/incubator-dolphinscheduler-website/blob/add-blog/img/JsonSplit/snowflake.png?raw=true";
 alt="snowflake"></p>
+<p><img 
src="https://github.com/apache/dolphinscheduler-website/blob/master/img/JsonSplit/snowflake.png?raw=true";
 alt="snowflake"></p>
 <p>主要分为 5 个部分：</p>
 <ol>
 <li>是 1 个 bit：0，这个是无意义的；</li>
@@ -102,7 +102,7 @@
 <ol>
 <li>毫秒数在高位，自增序列在低位，整个 ID 都是趋势递增的。</li>
 <li>不依赖数据库等第三方系统，以服务的方式部署，稳定性更高，生成 ID 的性能也是非常高的。</li>
-<li>可以根据自身业务特性分配 bi t位，非常灵活。</li>
+<li>可以根据自身业务特性分配 bit 位，非常灵活。</li>
 </ol>
 </section><footer class="footer-container"><div 
class="footer-body"><div><h3>联系我们</h3><h4>有问题需要反馈？请通过以下方式联系我们。</h4></div><div 
class="contact-container"><ul><li><img class="img-base" 
src="/img/emailgray.png"/><img class="img-change" src="/img/emailblue.png"/><a 
href="/zh-cn/community/development/subscribe.html"><p>邮件列表</p></a></li><li><img 
class="img-base" src="/img/twittergray.png"/><img class="img-change" 
src="/img/twitterblue.png"/><a 
href="https://twitter.com/dolphinschedule";><p>Twitt [...]
   <script 
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-with-addons.min.js"></script>
diff --git a/zh-cn/blog/json_split.json b/zh-cn/blog/json_split.json
index 1db2814..1051d04 100644
--- a/zh-cn/blog/json_split.json
+++ b/zh-cn/blog/json_split.json
@@ -1,6 +1,6 @@
 {
   "filename": "json_split.md",
-  "__html": "<h2>为什么要把 DolphinScheduler 工作流定义中保存任务及关系的大 json 
给拆了?</h2>\n<h3>背景</h3>\n<p>当前 DolphinScheduler 的工作流中的任务及关系保存时是以大 json 
的方式保存到数据库中 process_definiton 表的 process_definition_json 字段，如果某个工作流很大比如有 1000 
个任务，这个 json 字段也就随之变得非常大，在使用时需要解析 json，非常耗费性能，且任务没法重用，故社区计划启动 json 
拆分项目。可喜的是目前我们已经完成了这个工作的大部分，因此总结一下，供大家参考学习。</p>\n<h3>总结</h3>\n<p>json split 项目从 
2021-01-12 开始启动，到 2021-04-25 初步完成主要开发。代码已合入 dev 分支。感谢 
lenboo、JinyLeeChina、simon824、wen-hemin 四位伙伴参与 
coding。</p>\n<p>主要变化以及贡献如下：</p>\n<ul>\n [...]
+  "__html": "<h2>为什么要把 DolphinScheduler 工作流定义中保存任务及关系的大 json 
给拆了?</h2>\n<h3>背景</h3>\n<p>当前 DolphinScheduler 的工作流中的任务及关系保存时是以大 json 
的方式保存到数据库中 process_definiton 表的 process_definition_json 字段，如果某个工作流很大比如有 1000 
个任务，这个 json 字段也就随之变得非常大，在使用时需要解析 json，非常耗费性能，且任务没法重用，故社区计划启动 json 
拆分项目。可喜的是目前我们已经完成了这个工作的大部分，因此总结一下，供大家参考学习。</p>\n<h3>总结</h3>\n<p>json split 项目从 
2021-01-12 开始启动，到 2021-04-25 初步完成主要开发。代码已合入 dev 分支。感谢 
lenboo、JinyLeeChina、simon824、wen-hemin 四位伙伴参与 
coding。</p>\n<p>主要变化以及贡献如下：</p>\n<ul>\n [...]
   "link": "/dist/zh-cn/blog/json_split.html",
   "meta": {}
 }
\ No newline at end of file

[dolphinscheduler-website] branch asf-site updated: Automated deployment: 3a185130f9880217a5a96c3e389c09c1ff563e7a

Reply via email to