[dolphinscheduler] 06/08: [Feature-8030][docs] Add sqoop task doc (#12855)

zhongjiajie Mon, 28 Nov 2022 01:50:37 -0800

This is an automated email from the ASF dual-hosted git repository.

zhongjiajie pushed a commit to branch 3.0.3-prepare
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler.git


commit 6b1456985d34b73c662599ece9a2170cd441115d
Author: baihongbin <[email protected]>
AuthorDate: Sun Nov 13 00:09:05 2022 +0800

    [Feature-8030][docs] Add sqoop task doc (#12855)
    
    * [Feature-8030][docs] Add sqoop task doc
    
    * Update docs/docs/zh/guide/task/sqoop.md
    
    Co-authored-by: Eric Gao <[email protected]>
    
    * Update docs/docs/en/guide/task/sqoop.md
    
    Co-authored-by: Eric Gao <[email protected]>
    
    * [Feature-8030][docs] Add sqoop task doc
    
    Co-authored-by: Eric Gao <[email protected]>
    (cherry picked from commit 0373e0661586c48523cedaab8a80fc7298f50a4d)
---
 docs/docs/en/guide/task/sqoop.md     |  90 ++++++++++++++++++++++++++++++++++
 docs/docs/zh/guide/task/sqoop.md     |  91 +++++++++++++++++++++++++++++++++++
 docs/img/tasks/demo/sqoop_task01.png | Bin 0 -> 2530 bytes
 docs/img/tasks/demo/sqoop_task02.png | Bin 0 -> 216945 bytes
 docs/img/tasks/demo/sqoop_task03.png | Bin 0 -> 2455 bytes
 docs/img/tasks/icons/sqoop.png       | Bin 0 -> 815 bytes
 6 files changed, 181 insertions(+)

diff --git a/docs/docs/en/guide/task/sqoop.md b/docs/docs/en/guide/task/sqoop.md
new file mode 100644
index 0000000000..e2d65debbd
--- /dev/null
+++ b/docs/docs/en/guide/task/sqoop.md
@@ -0,0 +1,90 @@
+# Sqoop Node
+
+## Overview
+
+Sqoop task type for executing Sqoop application. The workers run `sqoop` to 
execute  sqoop tasks.
+
+## Create Task
+
+- Click `Project Management -> Project Name -> Workflow Definition`, and click 
the `Create Workflow` button to enter the DAG editing page.
+- Drag from the toolbar <img src="../../../../img/tasks/icons/sqoop.png" 
width="15"/> to the canvas.
+
+## Task Parameters
+
+[//]: # (TODO: use the commented anchor below once our website template 
supports this syntax)
+[//]: # (- Please refer to [DolphinScheduler Task Parameters 
Appendix]&#40;appendix.md#default-task-parameters&#41; `Default Task 
Parameters` section for default parameters.)
+
+- Please refer to [DolphinScheduler Task Parameters Appendix](appendix.md) 
`Default Task Parameters` section for default parameters.
+
+|            **Parameter**            |                                        
                                      **Description**                           
                                                    |
+|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Job Name                            | map-reduce job name                    
                                                                                
                                                    |
+| Direct                              | (1) import:Imports an individual table 
from an RDBMS to HDFS or Hve.  (2) export:Exports a set of files from HDFS or 
Hive back to an RDBMS.                                |
+| Hadoop Params                       | Hadoop custom param for sqoop job.     
                                                                                
                                                    |
+| Sqoop Advanced Parameters           | Sqoop advanced param for sqoop job.    
                                                                                
                                                    |
+| Data Source - Type                  | Select the corresponding data source 
type.                                                                           
                                                      |
+| Data Source - Datasource            | Select the corresponding DataSource.   
                                                                                
                                                    |
+| Data Source - ModelType             | (1) Form:Synchronize data from a 
table, need to fill in the `Table` and `ColumnType`. (2) SQL:Synchronize data 
of SQL queries result, need to fill in the `SQL Statement`. |
+| Data Source - Table                 | Sets the table name to use when 
importing to Hive.                                                              
                                                           |
+| Data Source - ColumnType            | (1) All Columns:Import all fields in 
the selected table.  (2) Some Columns:Import the specified fields in the 
selected table, need to fill in the `Column`.                |
+| Data Source - Column                | Fill in the field name, and separate 
with commas.                                                                    
                                                      |
+| Data Source - SQL Statement         | Fill in SQL query statement.           
                                                                                
                                                    |
+| Data Source - Map Column Hive       | Override mapping from SQL to Hive type 
for configured columns.                                                         
                                                    |
+| Data Source - Map Column Java       | Override mapping from SQL to Java type 
for configured columns.                                                         
                                                    |
+| Data Target - Type                  | Select the corresponding data target 
type.                                                                           
                                                      |
+| Data Target - Database              | Fill in the Hive database name.        
                                                                                
                                                    |
+| Data Target - Table                 | Fill in the Hive table name.           
                                                                                
                                                    |
+| Data Target - CreateHiveTable       | Import a table definition into Hive. 
If set, then the job will fail if the target hive table exits.                  
                                                      |
+| Data Target - DropDelimiter         | Drops `\n`, `\r`, and `\01` from 
string fields when importing to Hive.                                           
                                                          |
+| Data Target - OverWriteSrc          | Overwrite existing data in the Hive 
table.                                                                          
                                                       |
+| Data Target - Hive Target Dir       | You can also explicitly choose the 
target directory.                                                               
                                                        |
+| Data Target - ReplaceDelimiter      | Replace `\n`, `\r`, and `\01` from 
string fields with user defined string when importing to Hive.                  
                                                        |
+| Data Target - Hive partition Keys   | Fill in the hive partition keys name, 
and separate with commas.                                                       
                                                     |
+| Data Target - Hive partition Values | Fill in the hive partition Values 
name, and separate with commas.                                                 
                                                         |
+| Data Target - Target Dir            | Fill in the HDFS target directory.     
                                                                                
                                                    |
+| Data Target - DeleteTargetDir       | Delete the target directory if it 
exists.                                                                         
                                                         |
+| Data Target - CompressionCodec      | Choice the hadoop codec.               
                                                                                
                                                    |
+| Data Target - FileType              | Choice the storage Type.               
                                                                                
                                                    |
+| Data Target - FieldsTerminated      | Sets the field separator character.    
                                                                                
                                                    |
+| Data Target - LinesTerminated       | Sets the end-of-line character.        
                                                                                
                                                    |
+
+## Task Example
+
+This example demonstrates importing data from MySQL into Hive. The MySQL 
database name is `test` and the table name is `example`. The following figure 
shows sample data.
+
+![sqoop_task01](../../../../img/tasks/demo/sqoop_task01.png)
+
+### Configuring the Sqoop environment
+
+If you are using the Sqoop task type in a production environment, you must 
ensure that the worker can execute the `sqoop` command.
+
+### Configuring Sqoop Task Node
+
+you can configure the node content by following the steps in the diagram below.
+
+![sqoop_task02](../../../../img/tasks/demo/sqoop_task02.png)
+
+The key configuration in this sample is shown in the following table.
+
+|            **Parameter**            |                              **Value** 
                              |
+|-------------------------------------|----------------------------------------------------------------------|
+| Job Name                            | sqoop_mysql_to_hive_test               
                              |
+| Data Source - Type                  | MYSQL                                  
                              |
+| Data Source - Datasource            | MYSQL MyTestMySQL(You could change 
MyTestMySQL to the name you like) |
+| Data Source - ModelType             | Form                                   
                              |
+| Data Source - Table                 | example                                
                              |
+| Data Source - ColumnType            | All Columns                            
                              |
+| Data Target - Type                  | HIVE                                   
                              |
+| Data Target - Database              | tmp                                    
                              |
+| Data Target - Table                 | example                                
                              |
+| Data Target - CreateHiveTable       | true                                   
                              |
+| Data Target - DropDelimiter         | false                                  
                              |
+| Data Target - OverWriteSrc          | true                                   
                              |
+| Data Target - Hive Target Dir       | (No need to fill in)                   
                              |
+| Data Target - ReplaceDelimiter      | ,                                      
                              |
+| Data Target - Hive partition Keys   | (No need to fill in)                   
                              |
+| Data Target - Hive partition Values | (No need to fill in)                   
                              |
+
+### View run results
+
+![sqoop_task03](../../../../img/tasks/demo/sqoop_task03.png)
diff --git a/docs/docs/zh/guide/task/sqoop.md b/docs/docs/zh/guide/task/sqoop.md
new file mode 100644
index 0000000000..cfea038a31
--- /dev/null
+++ b/docs/docs/zh/guide/task/sqoop.md
@@ -0,0 +1,91 @@
+# SQOOP 节点
+
+## 综述
+
+SQOOP 任务类型，用于执行 SQOOP 程序。对于 SQOOP 节点，worker 会通过执行 `sqoop` 命令来执行 SQOOP 任务。
+
+## 创建任务
+
+- 点击项目管理 -> 项目名称 -> 工作流定义，点击“创建工作流”按钮，进入 DAG 编辑页面；
+- 拖动工具栏的 <img src="../../../../img/tasks/icons/sqoop.png" width="15"/> 
任务节点到画板中。
+
+## 任务参数
+
+[//]: # (TODO: use the commented anchor below once our website template 
supports this syntax)
+[//]: # (- 
默认参数说明请参考[DolphinScheduler任务参数附录]&#40;appendix.md#默认任务参数&#41;`默认任务参数`一栏。)
+
+- 默认参数说明请参考[DolphinScheduler任务参数附录](appendix.md)`默认任务参数`一栏。
+
+|     **任务参数**     |                                **描述**                     
           |
+|------------------|----------------------------------------------------------------------|
+| 任务名称             | map-reduce 任务名称                                           
           |
+| 流向               | (1) import：从 RDBMS 导入 HDFS 或Hive  (2) export：从 HDFS 或 
Hive 导出到 RDBMS |
+| Hadoop 参数        | 添加自定义 Hadoop 参数                                           
           |
+| Sqoop 参数         | 添加自定义 Sqoop 参数                                            
           |
+| 数据来源 - 类型        | 选择数据源类型                                                   
           |
+| 数据来源 - 数据源       | 选择数据源                                                     
           |
+| 数据来源 - 模式        | (1) 单表：同步单张表的数据，需填写`表名`和`列类型`  (2) SQL：同步 SQL 
查询的结果，需填写`SQL语句`       |
+| 数据来源 - 表名        | 设置需要导入 hive 的表名                                           
           |
+| 数据来源 - 列类型       | (1) 全表导入：导入表中的所有字段  (2) 选择列：导入表中的指定列，需填写`列`信息             
           |
+| 数据来源 - 列         | 填写字段名称，多个字段之间使用英文逗号分割                                     
           |
+| 数据来源 - SQL 语句    | 填写 SQL 查询语句                                               
           |
+| 数据来源 - Hive 类型映射 | 自定义 SQL 与 Hive 类型映射                                       
           |
+| 数据来源 - Java 类型映射 | 自定义 SQL 与 Java 类型映射                                       
           |
+| 数据目的 - 类型        | 选择数据目的类型                                                  
           |
+| 数据目的 - 数据库       | 填写 Hive 数据库名称                                             
           |
+| 数据目的 - 表名        | 填写 Hive 表名                                                
           |
+| 数据目的 - 是否创建新表    | 选择是否自动根据导入数据类型创建数据目的表，如果目标表已经存在了，那么创建任务会失败                
           |
+| 数据目的 - 是否删除分隔符   | 自动删除字符串中的`\n`、`\r`和`\01`字符                                
           |
+| 数据目的 - 是否覆盖数据源   | 覆盖 Hive 表中的现有数据                                           
           |
+| 数据目的 - Hive 目标路径 | 自定义 Hive 目标路径                                             
           |
+| 数据目的 - 替换分隔符     | 替换字符串中的`\n`、`\r`和`\01`字符                                  
           |
+| 数据目的 - Hive 分区键  | 填写 Hive 分区键，多个分区键之间使用英文逗号分割                               
           |
+| 数据目的 - Hive 分区值  | 填写 Hive 分区值，多个分区值之间使用英文逗号分割                               
           |
+| 数据目的 - 目标路径      | 填写 HDFS 的目标路径                                             
           |
+| 数据目的 - 是否删除目录    | 如果目录已经存在，则删除目录                                            
           |
+| 数据目的 - 压缩类型      | 选择 HDFS 文件压缩类型                                            
           |
+| 数据目的 - 保存格式      | 选择文件保存格式                                                  
           |
+| 数据目的 - 列分隔符      | 自定义列分隔符                                                   
           |
+| 数据目的 - 行分隔符      | 自定义行分隔符                                                   
           |
+
+## 任务样例
+
+该样例演示为从 MySQL 数据导入到 Hive 中。 其中 MySQL 数据库名称为：`test`，表名称为`example`。下图为样例数据。
+
+![sqoop_task01](../../../../img/tasks/demo/sqoop_task01.png)
+
+### 配置 Sqoop 环境
+
+若生产环境中要是使用到 Sqoop 任务类型，则需要先配置好所需的环境。确保任务节点可以执行`sqoop`命令。
+
+### 配置 Sqoop 任务节点
+
+可按照下图步骤进行配置节点内容。
+
+![sqoop_task02](../../../../img/tasks/demo/sqoop_task02.png)
+
+本样例中的关键配置如下表。
+
+|     **任务参数**     |                    **参数值**                     |
+|------------------|------------------------------------------------|
+| 任务名称             | sqoop_mysql_to_hive_test                       |
+| 流向               | import                                         |
+| 数据来源 - 类型        | MYSQL                                          |
+| 数据来源 - 数据源       | MYSQL MyTestMySQL（您可以将MyTestMySQL改成您自己取的数据源名称） |
+| 数据来源 - 模式        | 表单                                             |
+| 数据来源 - 表名        | example                                        |
+| 数据来源 - 列类型       | 全表导入                                           |
+| 数据目的 - 类型        | HIVE                                           |
+| 数据目的 - 数据库       | tmp                                            |
+| 数据目的 - 表名        | example                                        |
+| 数据目的 - 是否创建新表    | true                                           |
+| 数据目的 - 是否删除分隔符   | false                                          |
+| 数据目的 - 是否覆盖数据源   | true                                           |
+| 数据目的 - Hive 目标路径 | （无需填写）                                         |
+| 数据目的 - 替换分隔符     | ,                                              |
+| 数据目的 - Hive 分区键  | （无需填写）                                         |
+| 数据目的 - Hive 分区值  | （无需填写）                                         |
+
+### 查看运行结果
+
+![sqoop_task03](../../../../img/tasks/demo/sqoop_task03.png)
diff --git a/docs/img/tasks/demo/sqoop_task01.png 
b/docs/img/tasks/demo/sqoop_task01.png
new file mode 100644
index 0000000000..ec63a52337
Binary files /dev/null and b/docs/img/tasks/demo/sqoop_task01.png differ
diff --git a/docs/img/tasks/demo/sqoop_task02.png 
b/docs/img/tasks/demo/sqoop_task02.png
new file mode 100644
index 0000000000..18215eb98b
Binary files /dev/null and b/docs/img/tasks/demo/sqoop_task02.png differ
diff --git a/docs/img/tasks/demo/sqoop_task03.png 
b/docs/img/tasks/demo/sqoop_task03.png
new file mode 100644
index 0000000000..1a197ea79a
Binary files /dev/null and b/docs/img/tasks/demo/sqoop_task03.png differ
diff --git a/docs/img/tasks/icons/sqoop.png b/docs/img/tasks/icons/sqoop.png
new file mode 100644
index 0000000000..6ff06de10e
Binary files /dev/null and b/docs/img/tasks/icons/sqoop.png differ

[dolphinscheduler] 06/08: [Feature-8030][docs] Add sqoop task doc (#12855)

Reply via email to