[dolphinscheduler-website] branch asf-site updated: Automated deployment: 9c331d20ae961a945a24dcfe5ae80b46bfb7d7e9

github-bot Wed, 09 Mar 2022 17:16:43 -0800

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler-website.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 13bfe45  Automated deployment: 9c331d20ae961a945a24dcfe5ae80b46bfb7d7e9
13bfe45 is described below

commit 13bfe45f158a4be4b35f763c23bfa16a28051e0b
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Thu Mar 10 01:13:50 2022 +0000

    Automated deployment: 9c331d20ae961a945a24dcfe5ae80b46bfb7d7e9
---
 en-us/docs/dev/user_doc/guide/task/datax.html |  77 ++++++++++++++-----------
 en-us/docs/dev/user_doc/guide/task/datax.json |   2 +-
 img/tasks/demo/datax_task01.png               | Bin 0 -> 121401 bytes
 img/tasks/demo/datax_task02.png               | Bin 0 -> 203397 bytes
 img/tasks/demo/datax_task03.png               | Bin 0 -> 87482 bytes
 img/tasks/icons/datax.png                     | Bin 0 -> 1122 bytes
 zh-cn/docs/dev/user_doc/guide/task/datax.html |  79 +++++++++++++++-----------
 zh-cn/docs/dev/user_doc/guide/task/datax.json |   2 +-
 8 files changed, 91 insertions(+), 69 deletions(-)

diff --git a/en-us/docs/dev/user_doc/guide/task/datax.html 
b/en-us/docs/dev/user_doc/guide/task/datax.html
index 75b3bd4..d7cf7f7 100644
--- a/en-us/docs/dev/user_doc/guide/task/datax.html
+++ b/en-us/docs/dev/user_doc/guide/task/datax.html
@@ -11,41 +11,52 @@
 </head>
 <body>
   <div id="root"><div class="md2html docs-page" data-reactroot=""><header 
class="header-container header-container-dark"><div class="header-body"><span 
class="mobile-menu-btn mobile-menu-btn-dark"></span><a 
href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div 
class="search search-dark"><span class="icon-search"></span></div><span 
class="language-switch language-switch-dark">中</span><div 
class="header-menu"><div><ul class="ant-menu whiteClass ant-menu-light ant- 
[...]
+<h2>Overview</h2>
+<p>DataX task type for executing DataX programs. For DataX nodes, the worker 
will execute <code>${DATAX_HOME}/bin/datax.py</code> to analyze the input json 
file.</p>
+<h2>Create Task</h2>
 <ul>
-<li>
-<p>Drag in the toolbar<img src="/img/datax.png" width="35"/>Task node into the 
drawing board</p>
-<p align="center">
- <img src="/img/datax-en.png" width="80%" />
-</p>
-</li>
-<li>
-<p>Custom template: When you turn on the custom template switch, you can 
customize the content of the json configuration file of the datax node 
(applicable when the control configuration does not meet the requirements)</p>
-</li>
-<li>
-<p>Data source: select the data source to extract the data</p>
-</li>
-<li>
-<p>sql statement: the sql statement used to extract data from the target 
database, the sql query column name is automatically parsed when the node is 
executed, and mapped to the target table synchronization column name. When the 
source table and target table column names are inconsistent, they can be 
converted by column alias (as)</p>
-</li>
-<li>
-<p>Target library: select the target library for data synchronization</p>
-</li>
-<li>
-<p>Target table: the name of the target table for data synchronization</p>
-</li>
-<li>
-<p>Pre-sql: Pre-sql is executed before the sql statement (executed by the 
target library).</p>
-</li>
-<li>
-<p>Post-sql: Post-sql is executed after the sql statement (executed by the 
target library).</p>
-</li>
-<li>
-<p>json: json configuration file for datax synchronization</p>
-</li>
-<li>
-<p>Custom parameters: SQL task type, and stored procedure is a custom 
parameter order to set values for the method. The custom parameter type and 
data type are the same as the stored procedure task type. The difference is 
that the SQL task type custom parameter will replace the ${variable} in the SQL 
statement.</p>
-</li>
+<li>Click Project Management -&gt; Project Name -&gt; Workflow Definition, and 
click the &quot;Create Workflow&quot; button to enter the DAG editing page.</li>
+<li>Drag the <img src="/img/tasks/icons/datax.png" width="15"/> from the 
toolbar to the drawing board.</li>
 </ul>
+<h2>Task Parameter</h2>
+<ul>
+<li><strong>Node name</strong>: The node name in a workflow definition is 
unique.</li>
+<li><strong>Run flag</strong>: Identifies whether this node can be scheduled 
normally, if it does not need to be executed, you can turn on the prohibition 
switch.</li>
+<li><strong>Descriptive information</strong>: describe the function of the 
node.</li>
+<li><strong>Task priority</strong>: When the number of worker threads is 
insufficient, they are executed in order from high to low, and when the 
priority is the same, they are executed according to the first-in first-out 
principle.</li>
+<li><strong>Worker grouping</strong>: Tasks are assigned to the machines of 
the worker group to execute. If Default is selected, a worker machine will be 
randomly selected for execution.</li>
+<li><strong>Environment Name</strong>: Configure the environment name in which 
to run the script.</li>
+<li><strong>Number of failed retry attempts</strong>: The number of times the 
task failed to be resubmitted.</li>
+<li><strong>Failed retry interval</strong>: The time, in cents, interval for 
resubmitting the task after a failed task.</li>
+<li><strong>Delayed execution time</strong>: The time, in cents, that a task 
is delayed in execution.</li>
+<li><strong>Timeout alarm</strong>: Check the timeout alarm and timeout 
failure. When the task exceeds the &quot;timeout period&quot;, an alarm email 
will be sent and the task execution will fail.</li>
+<li><strong>Custom template</strong>: Custom the content of the DataX node's 
json profile when the default data source provided does not meet the required 
requirements.</li>
+<li><strong>json</strong>: json configuration file for DataX 
synchronization.</li>
+<li><strong>Custom parameters</strong>: SQL task type, and stored procedure is 
a custom parameter order to set values for the method. The custom parameter 
type and data type are the same as the stored procedure task type. The 
difference is that the SQL task type custom parameter will replace the 
${variable} in the SQL statement.</li>
+<li><strong>Data source</strong>: Select the data source from which the data 
will be extracted.</li>
+<li><strong>sql statement</strong>: the sql statement used to extract data 
from the target database, the sql query column name is automatically parsed 
when the node is executed, and mapped to the target table synchronization 
column name. When the source table and target table column names are 
inconsistent, they can be converted by column alias.</li>
+<li><strong>Target library</strong>: Select the target library for data 
synchronization.</li>
+<li><strong>Pre-sql</strong>: Pre-sql is executed before the sql statement 
(executed by the target library).</li>
+<li><strong>Post-sql</strong>: Post-sql is executed after the sql statement 
(executed by the target library).</li>
+<li><strong>Stream limit (number of bytes)</strong>: Limits the number of 
bytes in the query.</li>
+<li><strong>Limit flow (number of records)</strong>: Limit the number of 
records for a query.</li>
+<li><strong>Running memory</strong>: the minimum and maximum memory required 
can be configured to suit the actual production environment.</li>
+<li><strong>Predecessor task</strong>: Selecting a predecessor task for the 
current task will set the selected predecessor task as upstream of the current 
task.</li>
+</ul>
+<h2>Task Example</h2>
+<p>This example demonstrates importing data from Hive into MySQL.</p>
+<h3>Configuring the DataX environment in DolphinScheduler</h3>
+<p>If you are using the DataX task type in a production environment, it is 
necessary to configure the required environment first. The configuration file 
is as follows: 
<code>/dolphinscheduler/conf/env/dolphinscheduler_env.sh</code>.</p>
+<p><img src="/img/tasks/demo/datax_task01.png" alt="datax_task01"></p>
+<p>After the environment has been configured, DolphinScheduler needs to be 
restarted.</p>
+<h3>Configuring DataX Task Node</h3>
+<p>As the default data source does not contain data to be read from Hive, a 
custom json is required, refer to: <a 
href="https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md";>HDFS
 Writer</a>. Note: Partition directories exist on the HDFS path, when importing 
data in real world situations, partitioning is recommended to be passed as a 
parameter, using custom parameters.</p>
+<p>After writing the required json file, you can configure the node content by 
following the steps in the diagram below.</p>
+<p><img src="/img/tasks/demo/datax_task02.png" alt="datax_task02"></p>
+<h3>View run results</h3>
+<p><img src="/img/tasks/demo/datax_task03.png" alt="datax_task03"></p>
+<h3>Notice</h3>
+<p>If the default data source provided does not meet your needs, you can 
configure the writer and reader of DataX according to the actual usage 
environment in the custom template option, available at <a 
href="https://github.com/alibaba/DataX";>https://github.com/alibaba/DataX</a>.</p>
 </div></section><footer class="footer-container"><div 
class="footer-body"><div><h3>About us</h3><h4>Do you need feedback? Please 
contact us through the following ways.</h4></div><div 
class="contact-container"><ul><li><a 
href="/en-us/community/development/subscribe.html"><img class="img-base" 
src="/img/emailgray.png"/><img class="img-change" 
src="/img/emailblue.png"/><p>Email List</p></a></li><li><a 
href="https://twitter.com/dolphinschedule";><img class="img-base" 
src="/img/twittergray.png [...]
   <script 
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-with-addons.min.js"></script>
   <script 
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-dom.min.js"></script>
diff --git a/en-us/docs/dev/user_doc/guide/task/datax.json 
b/en-us/docs/dev/user_doc/guide/task/datax.json
index 317aea3..eece484 100644
--- a/en-us/docs/dev/user_doc/guide/task/datax.json
+++ b/en-us/docs/dev/user_doc/guide/task/datax.json
@@ -1,6 +1,6 @@
 {
   "filename": "datax.md",
-  "__html": "<h1>DataX</h1>\n<ul>\n<li>\n<p>Drag in the toolbar<img 
src=\"/img/datax.png\" width=\"35\"/>Task node into the drawing board</p>\n<p 
align=\"center\">\n <img src=\"/img/datax-en.png\" width=\"80%\" 
/>\n</p>\n</li>\n<li>\n<p>Custom template: When you turn on the custom template 
switch, you can customize the content of the json configuration file of the 
datax node (applicable when the control configuration does not meet the 
requirements)</p>\n</li>\n<li>\n<p>Data source: selec [...]
+  "__html": "<h1>DataX</h1>\n<h2>Overview</h2>\n<p>DataX task type for 
executing DataX programs. For DataX nodes, the worker will execute 
<code>${DATAX_HOME}/bin/datax.py</code> to analyze the input json 
file.</p>\n<h2>Create Task</h2>\n<ul>\n<li>Click Project Management -&gt; 
Project Name -&gt; Workflow Definition, and click the &quot;Create 
Workflow&quot; button to enter the DAG editing page.</li>\n<li>Drag the <img 
src=\"/img/tasks/icons/datax.png\" width=\"15\"/> from the toolbar to  [...]
   "link": "/dist/en-us/docs/dev/user_doc/guide/task/datax.html",
   "meta": {}
 }
\ No newline at end of file
diff --git a/img/tasks/demo/datax_task01.png b/img/tasks/demo/datax_task01.png
new file mode 100644
index 0000000..0e249d8
Binary files /dev/null and b/img/tasks/demo/datax_task01.png differ
diff --git a/img/tasks/demo/datax_task02.png b/img/tasks/demo/datax_task02.png
new file mode 100644
index 0000000..8398ed5
Binary files /dev/null and b/img/tasks/demo/datax_task02.png differ
diff --git a/img/tasks/demo/datax_task03.png b/img/tasks/demo/datax_task03.png
new file mode 100644
index 0000000..0ae2132
Binary files /dev/null and b/img/tasks/demo/datax_task03.png differ
diff --git a/img/tasks/icons/datax.png b/img/tasks/icons/datax.png
new file mode 100644
index 0000000..22519a9
Binary files /dev/null and b/img/tasks/icons/datax.png differ
diff --git a/zh-cn/docs/dev/user_doc/guide/task/datax.html 
b/zh-cn/docs/dev/user_doc/guide/task/datax.html
index a581c27..0a5c6d8 100644
--- a/zh-cn/docs/dev/user_doc/guide/task/datax.html
+++ b/zh-cn/docs/dev/user_doc/guide/task/datax.html
@@ -10,42 +10,53 @@
   <link rel="stylesheet" href="/build/vendor.23870e5.css">
 </head>
 <body>
-  <div id="root"><div class="md2html docs-page" data-reactroot=""><header 
class="header-container header-container-dark"><div class="header-body"><span 
class="mobile-menu-btn mobile-menu-btn-dark"></span><a 
href="/zh-cn/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div 
class="search search-dark"><span class="icon-search"></span></div><span 
class="language-switch language-switch-dark">En</span><div 
class="header-menu"><div><ul class="ant-menu whiteClass ant-menu-light ant [...]
+  <div id="root"><div class="md2html docs-page" data-reactroot=""><header 
class="header-container header-container-dark"><div class="header-body"><span 
class="mobile-menu-btn mobile-menu-btn-dark"></span><a 
href="/zh-cn/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div 
class="search search-dark"><span class="icon-search"></span></div><span 
class="language-switch language-switch-dark">En</span><div 
class="header-menu"><div><ul class="ant-menu whiteClass ant-menu-light ant [...]
+<h2>综述</h2>
+<p>DataX 任务类型，用于执行 DataX 程序。对于 DataX 节点，worker 会通过执行 
<code>${DATAX_HOME}/bin/datax.py</code> 来解析传入的 json 文件。</p>
+<h2>创建任务</h2>
 <ul>
-<li>
-<p>拖动工具栏中的<img src="/img/datax.png" width="35"/>任务节点到画板中</p>
-<p align="center">
- <img src="/img/datax_edit.png" width="80%" />
-</p>
-</li>
-<li>
-<p>自定义模板：打开自定义模板开关时，可以自定义datax节点的json配置文件内容（适用于控件配置不满足需求时）</p>
-</li>
-<li>
-<p>数据源：选择抽取数据的数据源</p>
-</li>
-<li>
-<p>sql语句：目标库抽取数据的sql语句，节点执行时自动解析sql查询列名，映射为目标表同步列名，源表和目标表列名不一致时，可以通过列别名（as）转换</p>
-</li>
-<li>
-<p>目标库：选择数据同步的目标库</p>
-</li>
-<li>
-<p>目标表：数据同步的目标表名</p>
-</li>
-<li>
-<p>前置sql:前置sql在sql语句之前执行（目标库执行）。</p>
-</li>
-<li>
-<p>后置sql:后置sql在sql语句之后执行（目标库执行）。</p>
-</li>
-<li>
-<p>json：datax同步的json配置文件</p>
-</li>
-<li>
-<p>自定义参数：SQL任务类型，而存储过程是自定义参数顺序的给方法设置值自定义参数类型和数据类型同存储过程任务类型一样。区别在于SQL任务类型自定义参数会替换sql语句中${变量}。</p>
-</li>
+<li>点击项目管理 -&gt; 项目名称 -&gt; 工作流定义，点击“创建工作流”按钮，进入 DAG 编辑页面；</li>
+<li>拖动工具栏的<img src="/img/tasks/icons/datax.png" width="15"/> 任务节点到画板中。</li>
 </ul>
+<h2>任务参数</h2>
+<ul>
+<li>节点名称：设置任务节点的名称。一个工作流定义中的节点名称是唯一的。</li>
+<li>运行标志：标识这个结点是否能正常调度，如果不需要执行，可以打开禁止执行开关。</li>
+<li>描述：描述该节点的功能。</li>
+<li>任务优先级：worker 线程数不足时，根据优先级从高到低依次执行，优先级一样时根据先进先出原则执行。</li>
+<li>Worker 分组：任务分配给 worker 组的机器执行，选择 Default ，会随机选择一台 worker 机执行。</li>
+<li>环境名称：配置运行脚本的环境。</li>
+<li>失败重试次数：任务失败重新提交的次数。</li>
+<li>失败重试间隔：任务失败重新提交任务的时间间隔，以分为单位。</li>
+<li>延时执行时间：任务延迟执行的时间，以分为单位。</li>
+<li>超时警告：勾选超时警告、超时失败，当任务超过“超时时长”后，会发送告警邮件并且任务执行失败。</li>
+<li>自定义模板：当默认提供的数据源不满足所需要求的时，可自定义 datax 节点的 json 配置文件内容。</li>
+<li>json：DataX 同步的 json 配置文件。</li>
+<li>自定义参数：sql 
任务类型，而存储过程是自定义参数顺序的给方法设置值自定义参数类型和数据类型同存储过程任务类型一样。区别在于SQL任务类型自定义参数会替换 sql 语句中 
${变量}。</li>
+<li>数据源：选择抽取数据的数据源。</li>
+<li>sql 语句：目标库抽取数据的 sql 语句，节点执行时自动解析 sql 
查询列名，映射为目标表同步列名，源表和目标表列名不一致时，可以通过列别名（as）转换。</li>
+<li>目标库：选择数据同步的目标库。</li>
+<li>目标库前置 sql：前置 sql 在 sql 语句之前执行（目标库执行）。</li>
+<li>目标库后置 sql：后置 sql 在 sql 语句之后执行（目标库执行）。</li>
+<li>限流（字节数）：限制查询的字节数。</li>
+<li>限流（记录数）：限制查询的记录数。</li>
+<li>运行内存：可根据实际生产环境配置所需的最小和最大内存。</li>
+<li>前置任务：选择当前任务的前置任务，会将被选择的前置任务设置为当前任务的上游。</li>
+</ul>
+<h2>任务样例</h2>
+<p>该样例演示为从 Hive 数据导入到 MySQL 中。</p>
+<h3>在 DolphinScheduler 中配置 DataX 环境</h3>
+<p>若生产环境中要是使用到 DataX 
任务类型，则需要先配置好所需的环境。配置文件如下：<code>/dolphinscheduler/conf/env/dolphinscheduler_env.sh</code>。</p>
+<p><img src="/img/tasks/demo/datax_task01.png" alt="datax_task01"></p>
+<p>当环境配置完成之后，需要重启 DolphinScheduler。</p>
+<h3>配置 DataX 任务节点</h3>
+<p>由于默认的的数据源中并不包含从 Hive 中读取数据，所以需要自定义 json，可参考：<a 
href="https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md";>HDFS
 Writer</a>。其中需要注意的是 HDFS 路径上存在分区目录，在实际情况导入数据时，分区建议进行传参，即使用自定义参数。</p>
+<p>在编写好所需的 json 之后，可按照下图步骤进行配置节点内容。</p>
+<p><img src="/img/tasks/demo/datax_task02.png" alt="datax_task02"></p>
+<h3>查看运行结果</h3>
+<p><img src="/img/tasks/demo/datax_task03.png" alt="datax_task03"></p>
+<h2>注意事项：</h2>
+<p>若默认提供的数据源不满足需求，可在自定义模板选项中，根据实际使用环境来配置 DataX 的 writer 和 reader，可参考：<a 
href="https://github.com/alibaba/DataX";>https://github.com/alibaba/DataX</a></p>
 </div></section><footer class="footer-container"><div 
class="footer-body"><div><h3>联系我们</h3><h4>有问题需要反馈？请通过以下方式联系我们。</h4></div><div 
class="contact-container"><ul><li><a 
href="/zh-cn/community/development/subscribe.html"><img class="img-base" 
src="/img/emailgray.png"/><img class="img-change" 
src="/img/emailblue.png"/><p>邮件列表</p></a></li><li><a 
href="https://twitter.com/dolphinschedule";><img class="img-base" 
src="/img/twittergray.png"/><img class="img-change" 
src="/img/twitterblue.png"/><p [...]
   <script 
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-with-addons.min.js"></script>
   <script 
src="//cdn.jsdelivr.net/npm/[email protected]/dist/react-dom.min.js"></script>
diff --git a/zh-cn/docs/dev/user_doc/guide/task/datax.json 
b/zh-cn/docs/dev/user_doc/guide/task/datax.json
index 7071b86..4619ca3 100644
--- a/zh-cn/docs/dev/user_doc/guide/task/datax.json
+++ b/zh-cn/docs/dev/user_doc/guide/task/datax.json
@@ -1,6 +1,6 @@
 {
   "filename": "datax.md",
-  "__html": "<h1>DATAX节点</h1>\n<ul>\n<li>\n<p>拖动工具栏中的<img 
src=\"/img/datax.png\" width=\"35\"/>任务节点到画板中</p>\n<p align=\"center\">\n <img 
src=\"/img/datax_edit.png\" width=\"80%\" 
/>\n</p>\n</li>\n<li>\n<p>自定义模板：打开自定义模板开关时，可以自定义datax节点的json配置文件内容（适用于控件配置不满足需求时）</p>\n</li>\n<li>\n<p>数据源：选择抽取数据的数据源</p>\n</li>\n<li>\n<p>sql语句：目标库抽取数据的sql语句，节点执行时自动解析sql查询列名，映射为目标表同步列名，源表和目标表列名不一致时，可以通过列别名（as）转换</p>\n</li>\n<li>\n<p>目标库：选择数据同步的目标库</p>\n</li>\n<li>\n<p>目标表：数据同步的目标表名</p>\n</li>\n<li>\n<p>前置sql:前
 [...]
+  "__html": "<h1>DATAX 节点</h1>\n<h2>综述</h2>\n<p>DataX 任务类型，用于执行 DataX 程序。对于 
DataX 节点，worker 会通过执行 <code>${DATAX_HOME}/bin/datax.py</code> 来解析传入的 json 
文件。</p>\n<h2>创建任务</h2>\n<ul>\n<li>点击项目管理 -&gt; 项目名称 -&gt; 工作流定义，点击“创建工作流”按钮，进入 
DAG 编辑页面；</li>\n<li>拖动工具栏的<img src=\"/img/tasks/icons/datax.png\" 
width=\"15\"/> 
任务节点到画板中。</li>\n</ul>\n<h2>任务参数</h2>\n<ul>\n<li>节点名称：设置任务节点的名称。一个工作流定义中的节点名称是唯一的。</li>\n<li>运行标志：标识这个结点是否能正常调度，如果不需要执行，可以打开禁止执行开关。</li>\n<li>描述：描述该节点的功能。</li>\n<li>任务优先级：worker
 线程数不足 [...]
   "link": "/dist/zh-cn/docs/dev/user_doc/guide/task/datax.html",
   "meta": {}
 }
\ No newline at end of file

[dolphinscheduler-website] branch asf-site updated: Automated deployment: 9c331d20ae961a945a24dcfe5ae80b46bfb7d7e9

Reply via email to