[GitHub] [dolphinscheduler-website] zhongjiajie commented on a change in pull request #723: [Feature-8026][Document] Add example and notice about task type DataX

GitBox Wed, 09 Mar 2022 17:10:59 -0800


zhongjiajie commented on a change in pull request #723:
URL: 
https://github.com/apache/dolphinscheduler-website/pull/723#discussion_r823247707




##########
File path: docs/en-us/dev/user_doc/guide/task/datax.md
##########
@@ -1,17 +1,64 @@
 # DataX
 
-- Drag in the toolbar<img src="/img/datax.png" width="35"/>Task node into the 
drawing board
-
-  <p align="center">
-   <img src="/img/datax-en.png" width="80%" />
-  </p>
-
-- Custom template: When you turn on the custom template switch, you can 
customize the content of the json configuration file of the datax node 
(applicable when the control configuration does not meet the requirements)
-- Data source: select the data source to extract the data
-- sql statement: the sql statement used to extract data from the target 
database, the sql query column name is automatically parsed when the node is 
executed, and mapped to the target table synchronization column name. When the 
source table and target table column names are inconsistent, they can be 
converted by column alias (as)
-- Target library: select the target library for data synchronization
-- Target table: the name of the target table for data synchronization
-- Pre-sql: Pre-sql is executed before the sql statement (executed by the 
target library).
-- Post-sql: Post-sql is executed after the sql statement (executed by the 
target library).
-- json: json configuration file for datax synchronization
-- Custom parameters: SQL task type, and stored procedure is a custom parameter 
order to set values for the method. The custom parameter type and data type are 
the same as the stored procedure task type. The difference is that the SQL task 
type custom parameter will replace the \${variable} in the SQL statement.
+## Overview
+
+DataX task type for executing DataX programs. For DataX nodes, the worker will 
execute `${DATAX_HOME}/bin/datax.py` to analyze the input json file.
+
+## Create Task
+
+- Click Project Management -> Project Name -> Workflow Definition, and click 
the "Create Workflow" button to enter the DAG editing page.
+- Drag the <img src="/img/tasks/icons/datax.png" width="15"/> from the toolbar 
to the drawing board.
+
+## Task Parameter
+
+-    **Node name**: The node name in a workflow definition is unique.
+-    **Run flag**: Identifies whether this node can be scheduled normally, if 
it does not need to be executed, you can turn on the prohibition switch.
+-    **Descriptive information**: describe the function of the node.
+-    **Task priority**: When the number of worker threads is insufficient, 
they are executed in order from high to low, and when the priority is the same, 
they are executed according to the first-in first-out principle.
+-    **Worker grouping**: Tasks are assigned to the machines of the worker 
group to execute. If Default is selected, a worker machine will be randomly 
selected for execution.
+-    **Environment Name**: Configure the environment name in which to run the 
script.
+-    **Number of failed retry attempts**: The number of times the task failed 
to be resubmitted.
+-    **Failed retry interval**: The time, in cents, interval for resubmitting 
the task after a failed task.
+-    **Delayed execution time**: The time, in cents, that a task is delayed in 
execution.
+-    **Timeout alarm**: Check the timeout alarm and timeout failure. When the 
task exceeds the "timeout period", an alarm email will be sent and the task 
execution will fail.
+-    **Custom template**: Custom the content of the DataX node's json profile 
when the default data source provided does not meet the required requirements.
+-    **json**: json configuration file for DataX synchronization.
+-    **Custom parameters**: SQL task type, and stored procedure is a custom 
parameter order to set values for the method. The custom parameter type and 
data type are the same as the stored procedure task type. The difference is 
that the SQL task type custom parameter will replace the \${variable} in the 
SQL statement.
+-    **Data source**: Select the data source from which the data will be 
extracted.
+-    **sql statement**: the sql statement used to extract data from the target 
database, the sql query column name is automatically parsed when the node is 
executed, and mapped to the target table synchronization column name. When the 
source table and target table column names are inconsistent, they can be 
converted by column alias.
+
+- **Target library**: Select the target library for data synchronization.
+- **Pre-sql**: Pre-sql is executed before the sql statement (executed by the 
target library).
+- **Post-sql**: Post-sql is executed after the sql statement (executed by the 
target library).
+- **Stream limit (number of bytes)**: Limits the number of bytes in the query.
+- **Limit flow (number of records)**: Limit the number of records for a query.
+- **Running memory**: the minimum and maximum memory required can be 
configured to suit the actual production environment.
+- **Predecessor task**: Selecting a predecessor task for the current task will 
set the selected predecessor task as upstream of the current task.
+
+## Task Example
+
+This example demonstrates importing data from Hive into MySQL.
+
+### Configuring the DataX environment in DolphinScheduler
+
+If you are using the DataX task type in a production environment, it is 
necessary to configure the required environment first. The configuration file 
is as follows: `/dolphinscheduler/conf/env/dolphinscheduler_env.sh`.

Review comment:
       No, in this time, we should only change `dev` docs, previous content 
just let it go.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [dolphinscheduler-website] zhongjiajie commented on a change in pull request #723: [Feature-8026][Document] Add example and notice about task type DataX

Reply via email to