[inlong-website] branch master updated: [INLONG-649][Doc] Add the usage document for Apache Hudi (#650)

dockerzhang Mon, 19 Dec 2022 01:12:50 -0800

This is an automated email from the ASF dual-hosted git repository.

dockerzhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/inlong-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 52d6c34ea5 [INLONG-649][Doc] Add the usage document for Apache Hudi 
(#650)
52d6c34ea5 is described below

commit 52d6c34ea5a72a2a2ef234d79e52d0a776fdda6a
Author: ZuoFengZhang <[email protected]>
AuthorDate: Mon Dec 19 17:12:41 2022 +0800

    [INLONG-649][Doc] Add the usage document for Apache Hudi (#650)
    
    Co-authored-by: averyzhang <[email protected]>
---
 community/how-to-report-issues.md                  |   2 +-
 docs/data_node/load_node/hudi.md                   | 140 +++++++++++++++++++++
 docs/data_node/load_node/img/hudi.png              | Bin 0 -> 293069 bytes
 docs/data_node/load_node/overview.md               |   1 +
 docs/introduction.md                               |   2 +-
 docs/modules/sort/overview.md                      |   3 +-
 .../current/how-to-report-issues.md                |  20 +--
 .../current/data_node/load_node/hudi.md            | 139 ++++++++++++++++++++
 .../current/data_node/load_node/img/hudi.png       | Bin 0 -> 93125 bytes
 .../current/data_node/load_node/overview.md        |   1 +
 .../current/introduction.md                        |   3 +-
 .../current/modules/sort/overview.md               |   1 +
 static/img/index-arch.svg                          |   2 +-
 .../design_and_concept/how_to_write_plugin_sort.md |   2 +-
 14 files changed, 300 insertions(+), 16 deletions(-)

diff --git a/community/how-to-report-issues.md 
b/community/how-to-report-issues.md
index 3fb64c9e59..047bfd77c8 100644
--- a/community/how-to-report-issues.md
+++ b/community/how-to-report-issues.md
@@ -36,7 +36,7 @@ For Summary, please provide a detailed title e.g. 
`[Bug][DataProxy] Repeated reg
 |      Agent      | data collection agent, supports reading regular logs from 
specified directories or files and reporting data one by one.  In the future,  
DB collection capabilities will also be expanded.                            |
 |    DataProxy    | a Proxy component based on Flume-ng,  supports data 
transmission blocking,  placing retransmission, and has the ability to forward 
received data to different MQ (message queues).                                 
   |
 |     TubeMQ      | Tencent's self-developed message queuing service,  focuses 
on high-performance storage and transmission of massive data in big data 
scenarios and has a relatively good core advantage in mass practice and low 
cost. |
-|      Sort       | after consuming data from different MQ services,  perform 
ETL processing,  and then aggregate and write the data into Apache Hive, 
ClickHouse,  Hbase,  IceBerg,  etc.                                             
   |
+|      Sort       | after consuming data from different MQ services,  perform 
ETL processing,  and then aggregate and write the data into Apache Hive, 
ClickHouse,  Hbase,  IceBerg,  Hudi, etc.                                       
   |
 |     Manager     | provides complete data service management and control 
capabilities,  including metadata,  OpenAPI,  task flow,  authority,  etc.      
                                                                                
|
 |    Dashboard    | a front-end page for managing data access,  simplifying 
the use of the entire InLong control platform.                                  
                                                                              |
 |      Audit      | performs real-time audit and reconciliation on the 
incoming and outgoing traffic of the Agent, DataProxy, and Sort modules of the 
InLong system.                                                                  
    |
diff --git a/docs/data_node/load_node/hudi.md b/docs/data_node/load_node/hudi.md
new file mode 100644
index 0000000000..1c5fb4c6fd
--- /dev/null
+++ b/docs/data_node/load_node/hudi.md
@@ -0,0 +1,140 @@
+---
+title: Hudi
+sidebar_position: 18
+---
+
+import {siteVariables} from '../../version';
+
+## Overview
+
+[Apache Hudi](https://hudi.apache.org/cn/docs/overview/) (pronounced "hoodie") 
is a next-generation streaming data lake platform.
+Apache Hudi brings core warehouse and database functionality directly into the 
data lake.
+Hudi provides tables, transactions, efficient upserts/deletes, advanced 
indexing, streaming ingestion services, data clustering/compression 
optimizations, and concurrency while keeping data in an open source file format.
+
+## Supported Version
+
+| Load Node         | Version                                                  
        |
+| ----------------- | 
---------------------------------------------------------------- |
+| [Hudi](./hudi.md) | 
[Hudi](https://hudi.apache.org/cn/docs/quick-start-guide): 0.12+ |
+
+### Dependencies
+
+Introduce `sort-connector-hudi` through `Maven` to build your own project.
+Of course, you can also directly use the `jar` package provided by `INLONG`.
+([sort-connector-hudi](https://inlong.apache.org/download/))
+
+### Maven dependency
+
+<pre><code parentName="pre">
+{`<dependency>
+    <groupId>org.apache.inlong</groupId>
+    <artifactId>sort-connector-hudi</artifactId>
+    <version>${siteVariables.inLongVersion}</version>
+</dependency>
+`}
+</code></pre>
+
+## How to create a Hive Load Node
+
+### Usage for SQL API
+
+The example below shows how to create a Hudi Load Node with `Flink SQL Cli` :
+
+```sql
+CREATE TABLE `hudi_table_name` (
+  id STRING,
+  name STRING,
+  uv BIGINT,
+  pv BIGINT
+) WITH (
+    'connector' = 'hudi-inlong',
+    'path' = 
'hdfs://127.0.0.1:90001/data/warehouse/hudi_db_name.db/hudi_table_name',
+    'uri' = 'thrift://127.0.0.1:8091',
+    'hoodie.database.name' = 'hudi_db_name',
+    'hoodie.table.name' = 'hudi_table_name',
+    'hoodie.datasource.write.recordkey.field' = 'id',
+    'hoodie.bucket.index.hash.field' = 'id',
+    -- compaction
+    'compaction.tasks' = '10',
+    'compaction.async.enabled' = 'true',
+    'compaction.schedule.enabled' = 'true',
+    'compaction.max_memory' = '3096',
+    'compaction.trigger.strategy' = 'num_or_time',
+    'compaction.delta_commits' = '5',
+    'compaction.max_memory' = '3096',
+    --
+    'hoodie.keep.min.commits' = '1440',
+    'hoodie.keep.max.commits' = '2880',
+    'clean.async.enabled' = 'true',
+    --
+    'write.operation' = 'upsert',
+    'write.bucket_assign.tasks' = '60',
+    'write.tasks' = '60',
+    'write.log_block.size' = '128',
+    --
+    'index.type' = 'BUCKET',
+    'metadata.enabled' = 'false',
+    'hoodie.bucket.index.num.buckets' = '20',
+    'table.type' = 'MERGE_ON_READ',
+    'clean.retain_commits' = '30',
+    'hoodie.cleaner.policy' = 'KEEP_LATEST_COMMITS'
+);
+```
+
+### Usage for InLong Dashboard
+
+#### Configuration
+
+When creating a data stream, select `Hudi` for the data stream direction, and 
click "Add" to configure it.
+
+![Hudi Configuration](img/hudi.png)
+
+| Config Item                          | prop in DDL statement                 
        | remark                                                                
                                                                                
                               |
+| ------------------------------------ | 
--------------------------------------------- | 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 |
+| `DbName`                             | `hoodie.database.name`                
        | the name of database                                                  
                                                                                
                               |
+| `TableName`                          | `hudi_table_name`                     
        | the name of table                                                     
                                                                                
                               |
+| `EnableCreateResource`               | -                                     
        | If the library table already exists and does not need to be modified, 
select [Do not create], <br/>otherwise select [Create], and the system will 
automatically create the resource. |
+| `Catalog URI`                        | `uri`                                 
        | The server uri of catalog                                             
                                                                                
                               |
+| `Warehouse`                          | -                                     
        | The location where the hudi table is stored in HDFS<br/>In the SQL 
DDL, the path attribute is to splice the `warehouse path` with the name of db 
and table                           |
+| `ExtList`                            | -                                     
        | The DDL attribute of the hudi table needs to be prefixed with 'ddl.'  
                                                                                
                               |
+| `Advanced options`>`DataConsistency` | -                                     
        | Consistency semantics of Flink computing engine: `EXACTLY_ONCE` or 
`AT_LEAST_ONCE`                                                                 
                                  |
+| `PartitionFieldList`                 | 
`hoodie.datasource.write.partitionpath.field` | partition field list            
                                                                                
                                                                     |
+| `PrimaryKey`                         | 
`hoodie.datasource.write.recordkey.field`     | primary key                     
                                                                                
                                                                     |
+
+### Usage for InLong Manager Client
+
+TODO: It will be supported in the future.
+
+## Hive Load Node Options
+
+| Option                                      | Required | Default | Type   | 
Description                                                                     
                                                                                
                                                              |
+| ------------------------------------------- | -------- | ------- | ------ | 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 |
+| connector                                   | required | (none)  | String | 
Specify what connector to use, here should be 'hudi-inlong'.                    
                                                                                
                                                              |
+| uri                                         | required | (none)  | String | 
Metastore uris for hive sync                                                    
                                                                                
                                                              |
+| hoodie.database.name                        | optional | (none)  | String | 
Database name that will be used for incremental query.If different databases 
have the same table name during  incremental query,  we can set it to limit the 
table name under a specific database                             |
+| hoodie.table.name                           | optional | (none)  | String | 
Table name that will be used for registering with Hive. Needs to be same across 
runs.                                                                           
                                                              |
+| hoodie.datasource.write.recordkey.field     | required | (none)  | String | 
Record key field. Value to be used as the `recordKey` component of `HoodieKey`. 
 Actual value will be obtained by invoking .toString() on the field value. 
Nested fields can be specified using  the dot notation eg: `a.b.c` |
+| hoodie.datasource.write.partitionpath.field | optional | (none)  | String | 
Partition path field. Value to be used at the partitionPath component of 
HoodieKey.  Actual value obtained by invoking .toString()                       
                                                                     |
+| inlong.metric.labels                        | optional | (none)  | String | 
Inlong metric label, format of value is 
groupId=xxgroup&streamId=xxstream&nodeId=xxnode.                                
                                                                                
                      |
+
+## Data Type Mapping
+
+| Hive type     | Flink SQL type |
+| ------------- | -------------- |
+| char(p)       | CHAR(p)        |
+| varchar(p)    | VARCHAR(p)     |
+| string        | STRING         |
+| boolean       | BOOLEAN        |
+| tinyint       | TINYINT        |
+| smallint      | SMALLINT       |
+| int           | INT            |
+| bigint        | BIGINT         |
+| float         | FLOAT          |
+| double        | DOUBLE         |
+| decimal(p, s) | DECIMAL(p, s)  |
+| date          | DATE           |
+| timestamp(9)  | TIMESTAMP      |
+| bytes         | BINARY         |
+| array         | LIST           |
+| map           | MAP            |
+| row           | STRUCT         |
diff --git a/docs/data_node/load_node/img/hudi.png 
b/docs/data_node/load_node/img/hudi.png
new file mode 100644
index 0000000000..80137b2cd4
Binary files /dev/null and b/docs/data_node/load_node/img/hudi.png differ
diff --git a/docs/data_node/load_node/overview.md 
b/docs/data_node/load_node/overview.md
index 72f9d0073e..49ce60c488 100644
--- a/docs/data_node/load_node/overview.md
+++ b/docs/data_node/load_node/overview.md
@@ -23,6 +23,7 @@ Load Nodes is a set of Sink Connectors based on <a 
href="https://flink.apache.or
 | [SQLServer](sqlserver.md)               | 
[SQLServer](https://www.microsoft.com/sql-server): 2012, 2014, 2016, 2017, 2019 
                                                                                
                                                                                
                                                                                
                                                                       | JDBC 
Driver: 7.2.2.jre8 |
 | [HDFS](hdfs.md)                         | 
[HDFS](https://hadoop.apache.org/): 2.x, 3.x                                    
                                                                                
                                                                                
                                                                                
                                                                       | None   
                 |
 | [Iceberg](iceberg.md)                   | 
[Iceberg](https://iceberg.apache.org/): 0.13.1+                                 
                                                                                
                                                                                
                                                                                
                                                                       | None   
                 |
+| [Hudi](hudi.md)                   | [Hudi](https://hudi.apache.org/): 0.12.x 
                                                                                
                                                                                
                                                                                
                                                                                
                       | None                    |
 
 
 ## Supported Flink Versions
diff --git a/docs/introduction.md b/docs/introduction.md
index 5fdcff9235..f39152e2e8 100644
--- a/docs/introduction.md
+++ b/docs/introduction.md
@@ -59,7 +59,7 @@ Apache InLong serves the entire life cycle from data 
collection to landing,  and
 - **inlong-agent**,  data collection services, including file collection, DB 
collection, etc.
 - **inlong-dataproxy**,  a Proxy component based on Flume-ng,  supports data 
transmission blocking,  placing retransmission, and has the ability to forward 
received data to different MQ (message queues).
 - **inlong-tubemq**,  Tencent's self-developed message queuing service,  
focuses on high-performance storage and transmission of massive data in big 
data scenarios and has a relatively good core advantage in mass practice and 
low cost.
-- **inlong-sort**,  after consuming data from different MQ services,  perform 
ETL processing,  and then aggregate and write the data into Apache Hive, 
ClickHouse,  Hbase,  IceBerg,  etc.
+- **inlong-sort**,  after consuming data from different MQ services,  perform 
ETL processing,  and then aggregate and write the data into Apache Hive, 
ClickHouse,  Hbase,  IceBerg,  Hudi,  etc.
 - **inlong-manager**, provides complete data service management and control 
capabilities,  including metadata,  OpenAPI,  task flow,  authority,  etc.
 - **inlong-dashboard**, a front-end page for managing data access,  
simplifying the use of the entire InLong control platform.
 - **inlong-audit**, performs real-time audit and reconciliation on the 
incoming and outgoing traffic of the Agent, DataProxy, and Sort modules of the 
InLong system.
diff --git a/docs/modules/sort/overview.md b/docs/modules/sort/overview.md
index 5c789f1947..d5e024b174 100644
--- a/docs/modules/sort/overview.md
+++ b/docs/modules/sort/overview.md
@@ -31,4 +31,5 @@ InLong Sort can be used together with the Manager to manage 
metadata, or it can
 |              | Iceberg                                    | 
 |              | PostgreSQL                                 | 
 |              | HDFS                                       | 
-|              | TDSQL Postgres                             | 
\ No newline at end of file
+|              | TDSQL Postgres                             | 
+|              | Hudi                                       | 
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/how-to-report-issues.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/how-to-report-issues.md
index e8c77c517e..93dbe5b198 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/how-to-report-issues.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/how-to-report-issues.md
@@ -32,16 +32,16 @@ Apache InLong 项目使用 GitHub Issues 来跟踪所有问题。 这些包括
 
 对于摘要，请提供详细的标题，例如 `[Bug][Dataproxy] Repeated registration jmx metric bean` 而不是 
`Dataproxy registration error`。
 
-|       组件        | 描述                                                         
           |
-|:---------------:|:----------------------------------------------------------------------|
-|      Agent      | 数据采集 Agent，支持从指定目录或文件读取常规日志、逐条上报。后续也将扩展 DB 采集等能力。          
           |
-|    DataProxy    | 一个基于 Flume-ng 的 Proxy 组件，支持数据发送阻塞和落盘重发，拥有将接收到的数据转发到不同 
MQ（消息队列）的能力。    |
-|     TubeMQ      | 腾讯自研的消息队列服务，专注于大数据场景下海量数据的高性能存储和传输，在海量实践和低成本方面有着良好的核心优势。   
           |
-|      Sort       | 对从不同的 MQ 消费到的数据进行 ETL 处理，然后汇聚并写入 
Hive、ClickHouse、Hbase、Iceberg 等存储系统。 |
-|     Manage      | 提供完整的数据服务管控能力，包括元数据、任务流、权限，OpenAPI 等。                      
           |
-|    Dashboard    | 用于管理数据接入的前端页面，简化整个 InLong 管控平台的使用。                         
           |
-|      Audit      | 对 InLong 系统的 Agent、DataProxy、Sort 模块的入流量、出流量进行实时审计对账。      
           |
-|       SDK       | 包括 DataProxy SDK, Sort SDK 等                               
           |
+|       组件        | 描述                                                         
                |
+|:---------------:|:---------------------------------------------------------------------------|
+|      Agent      | 数据采集 Agent，支持从指定目录或文件读取常规日志、逐条上报。后续也将扩展 DB 采集等能力。          
                |
+|    DataProxy    | 一个基于 Flume-ng 的 Proxy 组件，支持数据发送阻塞和落盘重发，拥有将接收到的数据转发到不同 
MQ（消息队列）的能力。         |
+|     TubeMQ      | 腾讯自研的消息队列服务，专注于大数据场景下海量数据的高性能存储和传输，在海量实践和低成本方面有着良好的核心优势。   
                |
+|      Sort       | 对从不同的 MQ 消费到的数据进行 ETL 处理，然后汇聚并写入 
Hive、ClickHouse、Hbase、Iceberg、Hudi 等存储系统。 |
+|     Manage      | 提供完整的数据服务管控能力，包括元数据、任务流、权限，OpenAPI 等。                      
                |
+|    Dashboard    | 用于管理数据接入的前端页面，简化整个 InLong 管控平台的使用。                         
                |
+|      Audit      | 对 InLong 系统的 Agent、DataProxy、Sort 模块的入流量、出流量进行实时审计对账。      
                |
+|       SDK       | 包括 DataProxy SDK, Sort SDK 等                               
                |
 
 影响版本字段可以设置为您发现错误的 InLong 的最早版本。 如果您不确定，则将其留空。
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/hudi.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/hudi.md
new file mode 100644
index 0000000000..1fe1b91ab9
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/hudi.md
@@ -0,0 +1,139 @@
+---
+title: Hudi
+sidebar_position: 18
+---
+
+import {siteVariables} from '../../version';
+
+## 概览
+
+[Apache Hudi](https://hudi.apache.org/cn/docs/overview/) 
(发音为"hoodie")是下一代流式数据湖平台。
+Apache Hudi 将核心仓库和数据库功能直接带到数据湖中。
+Hudi 提供表、事务、高效的 upserts/delete、高级索引、流摄入服务、数据聚类/压缩优化和并发，同时保持数据的开源文件格式。
+
+## 支持的版本
+
+| Load Node         | Version                                                  
        |
+| ----------------- | 
---------------------------------------------------------------- |
+| [Hudi](./hudi.md) | 
[Hudi](https://hudi.apache.org/cn/docs/quick-start-guide): 0.12+ |
+
+### 依赖
+
+通过 `Maven` 引入 `sort-connector-hudi` 构建自己的项目。
+当然，你也可以直接使用 `INLONG` 提供的 `jar` 
包。([sort-connector-hudi](https://inlong.apache.org/download))
+
+### Maven 依赖
+
+<pre><code parentName="pre">
+{`<dependency>
+    <groupId>org.apache.inlong</groupId>
+    <artifactId>sort-connector-hudi</artifactId>
+    <version>${siteVariables.inLongVersion}</version>
+</dependency>
+`}
+</code></pre>
+
+## 如何配置 Hudi 数据加载节点
+
+### SQL API 的使用
+
+使用 `Flink SQL Cli` :
+
+```sql
+CREATE TABLE `hudi_table_name` (
+  id STRING,
+  name STRING,
+  uv BIGINT,
+  pv BIGINT
+) WITH (
+    'connector' = 'hudi-inlong',
+    'path' = 
'hdfs://127.0.0.1:90001/data/warehouse/hudi_db_name.db/hudi_table_name',
+    'uri' = 'thrift://127.0.0.1:8091',
+    'hoodie.database.name' = 'hudi_db_name',
+    'hoodie.table.name' = 'hudi_table_name',
+    'hoodie.datasource.write.recordkey.field' = 'id',
+    'hoodie.bucket.index.hash.field' = 'id',
+    -- compaction
+    'compaction.tasks' = '10',
+    'compaction.async.enabled' = 'true',
+    'compaction.schedule.enabled' = 'true',
+    'compaction.max_memory' = '3096',
+    'compaction.trigger.strategy' = 'num_or_time',
+    'compaction.delta_commits' = '5',
+    'compaction.max_memory' = '3096',
+    --
+    'hoodie.keep.min.commits' = '1440',
+    'hoodie.keep.max.commits' = '2880',
+    'clean.async.enabled' = 'true',
+    --
+    'write.operation' = 'upsert',
+    'write.bucket_assign.tasks' = '60',
+    'write.tasks' = '60',
+    'write.log_block.size' = '128',
+    --
+    'index.type' = 'BUCKET',
+    'metadata.enabled' = 'false',
+    'hoodie.bucket.index.num.buckets' = '20',
+    'table.type' = 'MERGE_ON_READ',
+    'clean.retain_commits' = '30',
+    'hoodie.cleaner.policy' = 'KEEP_LATEST_COMMITS'
+);
+```
+
+### InLong Dashboard 方式
+
+#### 配置
+
+在创建数据流时，选择数据落地为 'Hive' 然后点击 'Add' 来配置 Hive 的相关信息。
+
+![Hudi Configuration](img/hudi.png)
+
+| 配置项            | 对应SQL DDL中的属性                                 | 备注          
                                            |
+| -------------- | --------------------------------------------- | 
------------------------------------------------------- |
+| `DB名称`         | `hoodie.database.name`                        | 库名称         
                                            |
+| `表名`           | `hudi_table_name`                             | hudi表名      
                                            |
+| `是否创建资源`       | -                                             | 
如果库表已经存在，且无需修改，则选【不创建】；<br/>否则请选择【创建】，由系统自动创建资源。        |
+| `Catalog URI`  | `uri`                                         | 元数据服务地址     
                                            |
+| `仓库路径`         | -                                             | 
hudi表存储在HDFS中的位置<br/>在SQL DDL中path属性是将`仓库路径`与库、表名称拼接在一起 |
+| `属性`           | -                                             | 
hudi表的DDL属性需带前缀'ddl.'                                   |
+| `高级选项`>`数据一致性` | -                                             | 
Flink计算引擎的一致性语义: `EXACTLY_ONCE`或`AT_LEAST_ONCE`         |
+| `分区字段`         | `hoodie.datasource.write.partitionpath.field` | 分区字段        
                                            |
+| `主键字段`         | `hoodie.datasource.write.recordkey.field`     | 主键字段        
                                            |
+
+### InLong Manager Client 方式
+
+TODO: 未来版本支持
+
+## Hudi 加载节点参数信息
+
+| 选项                                          | 必填  | 类型     | 描述              
                                                                                
|
+| ------------------------------------------- | --- | ------ | 
-----------------------------------------------------------------------------------------------
 |
+| connector                                   | 必填  | String | 
指定要使用的Connector，这里应该是'hudi-inlong'。                                             
                |
+| uri                                         | 必填  | String | 用于配置单元同步的 
Metastore uris                                                                  
      |
+| hoodie.database.name                        | 可选  | String | 
将用于增量查询的数据库名称。如果不同数据库在增量查询时有相同的表名，我们可以设置它来限制特定数据库下的表名                           
                |
+| hoodie.table.name                           | 可选  | String | 将用于向 Hive 
注册的表名。 需要在运行中保持一致。                                                              
      |
+| hoodie.datasource.write.recordkey.field     | 必填  | String | 记录的主键字段。 
用作“HoodieKey”的“recordKey”组件的值。 实际值将通过在字段值上调用 .toString() 来获得。 
可以使用点符号指定嵌套字段，例如：`a.b.c` |
+| hoodie.datasource.write.partitionpath.field | 可选  | String | 分区路径字段。 在 
HoodieKey 的 partitionPath 组件中使用的值。 通过调用 .toString() 获得的实际值                      
      |
+| inlong.metric.labels                        | 可选  | String | 在long metric 
label中，value的格式为groupId=xxgroup&streamId=xxstream&nodeId=xxnode。                
   |
+
+## 数据类型映射
+
+| Hive type     | Flink SQL type |
+| ------------- | -------------- |
+| char(p)       | CHAR(p)        |
+| varchar(p)    | VARCHAR(p)     |
+| string        | STRING         |
+| boolean       | BOOLEAN        |
+| tinyint       | TINYINT        |
+| smallint      | SMALLINT       |
+| int           | INT            |
+| bigint        | BIGINT         |
+| float         | FLOAT          |
+| double        | DOUBLE         |
+| decimal(p, s) | DECIMAL(p, s)  |
+| date          | DATE           |
+| timestamp(9)  | TIMESTAMP      |
+| bytes         | BINARY         |
+| array         | LIST           |
+| map           | MAP            |
+| row           | STRUCT         |
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/img/hudi.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/img/hudi.png
new file mode 100644
index 0000000000..b201eebcde
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/img/hudi.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/overview.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/overview.md
index a3dc62f0a3..29071fc92c 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/overview.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/overview.md
@@ -23,6 +23,7 @@ Load 节点列表是一组基于 <a href="https://flink.apache.org/";>Apache Flin
 | [SQLServer](sqlserver.md)               | 
[SQLServer](https://www.microsoft.com/sql-server): 2012, 2014, 2016, 2017, 2019 
                                                                                
                                                                                
                                                                                
                                                                       | JDBC 
Driver: 7.2.2.jre8 |
 | [HDFS](hdfs.md)                         | 
[HDFS](https://hadoop.apache.org/): 2.x, 3.x                                    
                                                                                
                                                                                
                                                                                
                                                                       | None   
                 |
 | [Iceberg](iceberg.md)                   | 
[Iceberg](https://iceberg.apache.org/): 0.13.1+                                 
                                                                                
                                                                                
                                                                                
                                                                       | None   
                 |
+| [Hudi](hudi.md)                   | [Hudi](https://hudi.apache.org/): 0.12.x 
                                                                                
                                                                                
                                                                                
                                                                                
                       | None                    |
 
 
 ## 支持的 Flink 版本列表
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/introduction.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/introduction.md
index d503fcdf94..f966b9fea0 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/introduction.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/introduction.md
@@ -56,7 +56,7 @@ Apache InLong 服务于数据采集到落地的整个生命周期，按数据的
 - **inlong-agent**，数据采集服务，包括文件采集、DB 采集等。
 - **inlong-dataproxy**，一个基于 Flume-ng 的 Proxy 组件，支持数据发送阻塞和落盘重发，拥有将接收到的数据转发到不同 
MQ（消息队列）的能力。
 - **inlong-tubemq**，腾讯自研的消息队列服务，专注于大数据场景下海量数据的高性能存储和传输，在海量实践和低成本方面有着良好的核心优势。
-- **inlong-sort**，对从不同的 MQ 消费到的数据进行 ETL 处理，然后汇聚并写入 
Hive、ClickHouse、Hbase、Iceberg 等存储系统。
+- **inlong-sort**，对从不同的 MQ 消费到的数据进行 ETL 处理，然后汇聚并写入 
Hive、ClickHouse、Hbase、Iceberg、Hudi 等存储系统。
 - **inlong-manager**，提供完整的数据服务管控能力，包括元数据、任务流、权限，OpenAPI 等。
 - **inlong-dashboard**，用于管理数据接入的前端页面，简化整个 InLong 管控平台的使用。
 - **inlong-audit**，对 InLong 系统的 Agent、DataProxy、Sort 模块的入流量、出流量进行实时审计对账。
@@ -78,6 +78,7 @@ Apache InLong 服务于数据采集到落地的整个生命周期，按数据的
 | Load Node    | Auto Consumption  | None                         | Standard   
           |
 |              | Hive              | 1.x, 2.x, 3.x                | 
Lightweight, Standard |
 |              | Iceberg           | 0.12.x                       | 
Lightweight, Standard |
+|              | Hudi              | 0.12.x                       | 
Lightweight, Standard |
 |              | ClickHouse        | 20.7+                        | 
Lightweight, Standard |
 |              | Kafka             | 2.x                          | 
Lightweight, Standard |
 |              | HBase             | 2.2.x                        | 
Lightweight, Standard |
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/overview.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/overview.md
index 6e9a1db722..c6755c00b6 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/overview.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/overview.md
@@ -30,4 +30,5 @@ InLong Sort 既支持和 Manager 一起配合使用，通过 Manager 进行系
 |              | PostgreSQL                                 | 
 |              | HDFS                                       | 
 |              | TDSQL Postgres                             | 
+|              | Hudi                                       | 
 
diff --git a/static/img/index-arch.svg b/static/img/index-arch.svg
index 2fdd2b2fcb..1fa2f34bb7 100644
--- a/static/img/index-arch.svg
+++ b/static/img/index-arch.svg
@@ -289,7 +289,7 @@
                 <use fill="#5494FF" xlink:href="#L"/>
             </g>
             <text font-family=".AppleSystemUIFont" font-size="16" fill="#FFF" 
transform="translate(39 150)">
-                <tspan x="44.5" y="29">Iceberg</tspan>
+                <tspan x="20" y="29">Iceberg / Hudi</tspan>
             </text>
             <g>
                 <g transform="translate(39 222)">
diff --git 
a/versioned_docs/version-1.1.0/design_and_concept/how_to_write_plugin_sort.md 
b/versioned_docs/version-1.1.0/design_and_concept/how_to_write_plugin_sort.md
index ca589372a4..bd5336f062 100644
--- 
a/versioned_docs/version-1.1.0/design_and_concept/how_to_write_plugin_sort.md
+++ 
b/versioned_docs/version-1.1.0/design_and_concept/how_to_write_plugin_sort.md
@@ -4,7 +4,7 @@ sidebar_position: 3
 ---
 
 # Overview
-InLong-Sort is known as a real-time ETL system. Currently, supported sinks are 
hive, kafka, clickhouse and iceberg.
+InLong-Sort is known as a real-time ETL system. Currently, supported sinks are 
hive, kafka, clickhouse, hudi and iceberg.
 This article introduces how to extend a new type of sink in InLong-Sort.
 
 # Extend a new sink function

[inlong-website] branch master updated: [INLONG-649][Doc] Add the usage document for Apache Hudi (#650)

Reply via email to