This is an automated email from the ASF dual-hosted git repository.
dockerzhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/inlong-website.git
The following commit(s) were added to refs/heads/master by this push:
new 52d6c34ea5 [INLONG-649][Doc] Add the usage document for Apache Hudi
(#650)
52d6c34ea5 is described below
commit 52d6c34ea5a72a2a2ef234d79e52d0a776fdda6a
Author: ZuoFengZhang <[email protected]>
AuthorDate: Mon Dec 19 17:12:41 2022 +0800
[INLONG-649][Doc] Add the usage document for Apache Hudi (#650)
Co-authored-by: averyzhang <[email protected]>
---
community/how-to-report-issues.md | 2 +-
docs/data_node/load_node/hudi.md | 140 +++++++++++++++++++++
docs/data_node/load_node/img/hudi.png | Bin 0 -> 293069 bytes
docs/data_node/load_node/overview.md | 1 +
docs/introduction.md | 2 +-
docs/modules/sort/overview.md | 3 +-
.../current/how-to-report-issues.md | 20 +--
.../current/data_node/load_node/hudi.md | 139 ++++++++++++++++++++
.../current/data_node/load_node/img/hudi.png | Bin 0 -> 93125 bytes
.../current/data_node/load_node/overview.md | 1 +
.../current/introduction.md | 3 +-
.../current/modules/sort/overview.md | 1 +
static/img/index-arch.svg | 2 +-
.../design_and_concept/how_to_write_plugin_sort.md | 2 +-
14 files changed, 300 insertions(+), 16 deletions(-)
diff --git a/community/how-to-report-issues.md
b/community/how-to-report-issues.md
index 3fb64c9e59..047bfd77c8 100644
--- a/community/how-to-report-issues.md
+++ b/community/how-to-report-issues.md
@@ -36,7 +36,7 @@ For Summary, please provide a detailed title e.g.
`[Bug][DataProxy] Repeated reg
| Agent | data collection agent, supports reading regular logs from
specified directories or files and reporting data one by one. In the future,
DB collection capabilities will also be expanded. |
| DataProxy | a Proxy component based on Flume-ng, supports data
transmission blocking, placing retransmission, and has the ability to forward
received data to different MQ (message queues).
|
| TubeMQ | Tencent's self-developed message queuing service, focuses
on high-performance storage and transmission of massive data in big data
scenarios and has a relatively good core advantage in mass practice and low
cost. |
-| Sort | after consuming data from different MQ services, perform
ETL processing, and then aggregate and write the data into Apache Hive,
ClickHouse, Hbase, IceBerg, etc.
|
+| Sort | after consuming data from different MQ services, perform
ETL processing, and then aggregate and write the data into Apache Hive,
ClickHouse, Hbase, IceBerg, Hudi, etc.
|
| Manager | provides complete data service management and control
capabilities, including metadata, OpenAPI, task flow, authority, etc.
|
| Dashboard | a front-end page for managing data access, simplifying
the use of the entire InLong control platform.
|
| Audit | performs real-time audit and reconciliation on the
incoming and outgoing traffic of the Agent, DataProxy, and Sort modules of the
InLong system.
|
diff --git a/docs/data_node/load_node/hudi.md b/docs/data_node/load_node/hudi.md
new file mode 100644
index 0000000000..1c5fb4c6fd
--- /dev/null
+++ b/docs/data_node/load_node/hudi.md
@@ -0,0 +1,140 @@
+---
+title: Hudi
+sidebar_position: 18
+---
+
+import {siteVariables} from '../../version';
+
+## Overview
+
+[Apache Hudi](https://hudi.apache.org/cn/docs/overview/) (pronounced "hoodie")
is a next-generation streaming data lake platform.
+Apache Hudi brings core warehouse and database functionality directly into the
data lake.
+Hudi provides tables, transactions, efficient upserts/deletes, advanced
indexing, streaming ingestion services, data clustering/compression
optimizations, and concurrency while keeping data in an open source file format.
+
+## Supported Version
+
+| Load Node | Version
|
+| ----------------- |
---------------------------------------------------------------- |
+| [Hudi](./hudi.md) |
[Hudi](https://hudi.apache.org/cn/docs/quick-start-guide): 0.12+ |
+
+### Dependencies
+
+Introduce `sort-connector-hudi` through `Maven` to build your own project.
+Of course, you can also directly use the `jar` package provided by `INLONG`.
+([sort-connector-hudi](https://inlong.apache.org/download/))
+
+### Maven dependency
+
+<pre><code parentName="pre">
+{`<dependency>
+ <groupId>org.apache.inlong</groupId>
+ <artifactId>sort-connector-hudi</artifactId>
+ <version>${siteVariables.inLongVersion}</version>
+</dependency>
+`}
+</code></pre>
+
+## How to create a Hive Load Node
+
+### Usage for SQL API
+
+The example below shows how to create a Hudi Load Node with `Flink SQL Cli` :
+
+```sql
+CREATE TABLE `hudi_table_name` (
+ id STRING,
+ name STRING,
+ uv BIGINT,
+ pv BIGINT
+) WITH (
+ 'connector' = 'hudi-inlong',
+ 'path' =
'hdfs://127.0.0.1:90001/data/warehouse/hudi_db_name.db/hudi_table_name',
+ 'uri' = 'thrift://127.0.0.1:8091',
+ 'hoodie.database.name' = 'hudi_db_name',
+ 'hoodie.table.name' = 'hudi_table_name',
+ 'hoodie.datasource.write.recordkey.field' = 'id',
+ 'hoodie.bucket.index.hash.field' = 'id',
+ -- compaction
+ 'compaction.tasks' = '10',
+ 'compaction.async.enabled' = 'true',
+ 'compaction.schedule.enabled' = 'true',
+ 'compaction.max_memory' = '3096',
+ 'compaction.trigger.strategy' = 'num_or_time',
+ 'compaction.delta_commits' = '5',
+ 'compaction.max_memory' = '3096',
+ --
+ 'hoodie.keep.min.commits' = '1440',
+ 'hoodie.keep.max.commits' = '2880',
+ 'clean.async.enabled' = 'true',
+ --
+ 'write.operation' = 'upsert',
+ 'write.bucket_assign.tasks' = '60',
+ 'write.tasks' = '60',
+ 'write.log_block.size' = '128',
+ --
+ 'index.type' = 'BUCKET',
+ 'metadata.enabled' = 'false',
+ 'hoodie.bucket.index.num.buckets' = '20',
+ 'table.type' = 'MERGE_ON_READ',
+ 'clean.retain_commits' = '30',
+ 'hoodie.cleaner.policy' = 'KEEP_LATEST_COMMITS'
+);
+```
+
+### Usage for InLong Dashboard
+
+#### Configuration
+
+When creating a data stream, select `Hudi` for the data stream direction, and
click "Add" to configure it.
+
+
+
+| Config Item | prop in DDL statement
| remark
|
+| ------------------------------------ |
--------------------------------------------- |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
+| `DbName` | `hoodie.database.name`
| the name of database
|
+| `TableName` | `hudi_table_name`
| the name of table
|
+| `EnableCreateResource` | -
| If the library table already exists and does not need to be modified,
select [Do not create], <br/>otherwise select [Create], and the system will
automatically create the resource. |
+| `Catalog URI` | `uri`
| The server uri of catalog
|
+| `Warehouse` | -
| The location where the hudi table is stored in HDFS<br/>In the SQL
DDL, the path attribute is to splice the `warehouse path` with the name of db
and table |
+| `ExtList` | -
| The DDL attribute of the hudi table needs to be prefixed with 'ddl.'
|
+| `Advanced options`>`DataConsistency` | -
| Consistency semantics of Flink computing engine: `EXACTLY_ONCE` or
`AT_LEAST_ONCE`
|
+| `PartitionFieldList` |
`hoodie.datasource.write.partitionpath.field` | partition field list
|
+| `PrimaryKey` |
`hoodie.datasource.write.recordkey.field` | primary key
|
+
+### Usage for InLong Manager Client
+
+TODO: It will be supported in the future.
+
+## Hive Load Node Options
+
+| Option | Required | Default | Type |
Description
|
+| ------------------------------------------- | -------- | ------- | ------ |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
+| connector | required | (none) | String |
Specify what connector to use, here should be 'hudi-inlong'.
|
+| uri | required | (none) | String |
Metastore uris for hive sync
|
+| hoodie.database.name | optional | (none) | String |
Database name that will be used for incremental query.If different databases
have the same table name during incremental query, we can set it to limit the
table name under a specific database |
+| hoodie.table.name | optional | (none) | String |
Table name that will be used for registering with Hive. Needs to be same across
runs.
|
+| hoodie.datasource.write.recordkey.field | required | (none) | String |
Record key field. Value to be used as the `recordKey` component of `HoodieKey`.
Actual value will be obtained by invoking .toString() on the field value.
Nested fields can be specified using the dot notation eg: `a.b.c` |
+| hoodie.datasource.write.partitionpath.field | optional | (none) | String |
Partition path field. Value to be used at the partitionPath component of
HoodieKey. Actual value obtained by invoking .toString()
|
+| inlong.metric.labels | optional | (none) | String |
Inlong metric label, format of value is
groupId=xxgroup&streamId=xxstream&nodeId=xxnode.
|
+
+## Data Type Mapping
+
+| Hive type | Flink SQL type |
+| ------------- | -------------- |
+| char(p) | CHAR(p) |
+| varchar(p) | VARCHAR(p) |
+| string | STRING |
+| boolean | BOOLEAN |
+| tinyint | TINYINT |
+| smallint | SMALLINT |
+| int | INT |
+| bigint | BIGINT |
+| float | FLOAT |
+| double | DOUBLE |
+| decimal(p, s) | DECIMAL(p, s) |
+| date | DATE |
+| timestamp(9) | TIMESTAMP |
+| bytes | BINARY |
+| array | LIST |
+| map | MAP |
+| row | STRUCT |
diff --git a/docs/data_node/load_node/img/hudi.png
b/docs/data_node/load_node/img/hudi.png
new file mode 100644
index 0000000000..80137b2cd4
Binary files /dev/null and b/docs/data_node/load_node/img/hudi.png differ
diff --git a/docs/data_node/load_node/overview.md
b/docs/data_node/load_node/overview.md
index 72f9d0073e..49ce60c488 100644
--- a/docs/data_node/load_node/overview.md
+++ b/docs/data_node/load_node/overview.md
@@ -23,6 +23,7 @@ Load Nodes is a set of Sink Connectors based on <a
href="https://flink.apache.or
| [SQLServer](sqlserver.md) |
[SQLServer](https://www.microsoft.com/sql-server): 2012, 2014, 2016, 2017, 2019
| JDBC
Driver: 7.2.2.jre8 |
| [HDFS](hdfs.md) |
[HDFS](https://hadoop.apache.org/): 2.x, 3.x
| None
|
| [Iceberg](iceberg.md) |
[Iceberg](https://iceberg.apache.org/): 0.13.1+
| None
|
+| [Hudi](hudi.md) | [Hudi](https://hudi.apache.org/): 0.12.x
| None |
## Supported Flink Versions
diff --git a/docs/introduction.md b/docs/introduction.md
index 5fdcff9235..f39152e2e8 100644
--- a/docs/introduction.md
+++ b/docs/introduction.md
@@ -59,7 +59,7 @@ Apache InLong serves the entire life cycle from data
collection to landing, and
- **inlong-agent**, data collection services, including file collection, DB
collection, etc.
- **inlong-dataproxy**, a Proxy component based on Flume-ng, supports data
transmission blocking, placing retransmission, and has the ability to forward
received data to different MQ (message queues).
- **inlong-tubemq**, Tencent's self-developed message queuing service,
focuses on high-performance storage and transmission of massive data in big
data scenarios and has a relatively good core advantage in mass practice and
low cost.
-- **inlong-sort**, after consuming data from different MQ services, perform
ETL processing, and then aggregate and write the data into Apache Hive,
ClickHouse, Hbase, IceBerg, etc.
+- **inlong-sort**, after consuming data from different MQ services, perform
ETL processing, and then aggregate and write the data into Apache Hive,
ClickHouse, Hbase, IceBerg, Hudi, etc.
- **inlong-manager**, provides complete data service management and control
capabilities, including metadata, OpenAPI, task flow, authority, etc.
- **inlong-dashboard**, a front-end page for managing data access,
simplifying the use of the entire InLong control platform.
- **inlong-audit**, performs real-time audit and reconciliation on the
incoming and outgoing traffic of the Agent, DataProxy, and Sort modules of the
InLong system.
diff --git a/docs/modules/sort/overview.md b/docs/modules/sort/overview.md
index 5c789f1947..d5e024b174 100644
--- a/docs/modules/sort/overview.md
+++ b/docs/modules/sort/overview.md
@@ -31,4 +31,5 @@ InLong Sort can be used together with the Manager to manage
metadata, or it can
| | Iceberg |
| | PostgreSQL |
| | HDFS |
-| | TDSQL Postgres |
\ No newline at end of file
+| | TDSQL Postgres |
+| | Hudi |
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/how-to-report-issues.md
b/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/how-to-report-issues.md
index e8c77c517e..93dbe5b198 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/how-to-report-issues.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/how-to-report-issues.md
@@ -32,16 +32,16 @@ Apache InLong 项目使用 GitHub Issues 来跟踪所有问题。 这些包括
对于摘要,请提供详细的标题,例如 `[Bug][Dataproxy] Repeated registration jmx metric bean` 而不是
`Dataproxy registration error`。
-| 组件 | 描述
|
-|:---------------:|:----------------------------------------------------------------------|
-| Agent | 数据采集 Agent,支持从指定目录或文件读取常规日志、逐条上报。后续也将扩展 DB 采集等能力。
|
-| DataProxy | 一个基于 Flume-ng 的 Proxy 组件,支持数据发送阻塞和落盘重发,拥有将接收到的数据转发到不同
MQ(消息队列)的能力。 |
-| TubeMQ | 腾讯自研的消息队列服务,专注于大数据场景下海量数据的高性能存储和传输,在海量实践和低成本方面有着良好的核心优势。
|
-| Sort | 对从不同的 MQ 消费到的数据进行 ETL 处理,然后汇聚并写入
Hive、ClickHouse、Hbase、Iceberg 等存储系统。 |
-| Manage | 提供完整的数据服务管控能力,包括元数据、任务流、权限,OpenAPI 等。
|
-| Dashboard | 用于管理数据接入的前端页面,简化整个 InLong 管控平台的使用。
|
-| Audit | 对 InLong 系统的 Agent、DataProxy、Sort 模块的入流量、出流量进行实时审计对账。
|
-| SDK | 包括 DataProxy SDK, Sort SDK 等
|
+| 组件 | 描述
|
+|:---------------:|:---------------------------------------------------------------------------|
+| Agent | 数据采集 Agent,支持从指定目录或文件读取常规日志、逐条上报。后续也将扩展 DB 采集等能力。
|
+| DataProxy | 一个基于 Flume-ng 的 Proxy 组件,支持数据发送阻塞和落盘重发,拥有将接收到的数据转发到不同
MQ(消息队列)的能力。 |
+| TubeMQ | 腾讯自研的消息队列服务,专注于大数据场景下海量数据的高性能存储和传输,在海量实践和低成本方面有着良好的核心优势。
|
+| Sort | 对从不同的 MQ 消费到的数据进行 ETL 处理,然后汇聚并写入
Hive、ClickHouse、Hbase、Iceberg、Hudi 等存储系统。 |
+| Manage | 提供完整的数据服务管控能力,包括元数据、任务流、权限,OpenAPI 等。
|
+| Dashboard | 用于管理数据接入的前端页面,简化整个 InLong 管控平台的使用。
|
+| Audit | 对 InLong 系统的 Agent、DataProxy、Sort 模块的入流量、出流量进行实时审计对账。
|
+| SDK | 包括 DataProxy SDK, Sort SDK 等
|
影响版本字段可以设置为您发现错误的 InLong 的最早版本。 如果您不确定,则将其留空。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/hudi.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/hudi.md
new file mode 100644
index 0000000000..1fe1b91ab9
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/hudi.md
@@ -0,0 +1,139 @@
+---
+title: Hudi
+sidebar_position: 18
+---
+
+import {siteVariables} from '../../version';
+
+## 概览
+
+[Apache Hudi](https://hudi.apache.org/cn/docs/overview/)
(发音为"hoodie")是下一代流式数据湖平台。
+Apache Hudi 将核心仓库和数据库功能直接带到数据湖中。
+Hudi 提供表、事务、高效的 upserts/delete、高级索引、流摄入服务、数据聚类/压缩优化和并发,同时保持数据的开源文件格式。
+
+## 支持的版本
+
+| Load Node | Version
|
+| ----------------- |
---------------------------------------------------------------- |
+| [Hudi](./hudi.md) |
[Hudi](https://hudi.apache.org/cn/docs/quick-start-guide): 0.12+ |
+
+### 依赖
+
+通过 `Maven` 引入 `sort-connector-hudi` 构建自己的项目。
+当然,你也可以直接使用 `INLONG` 提供的 `jar`
包。([sort-connector-hudi](https://inlong.apache.org/download))
+
+### Maven 依赖
+
+<pre><code parentName="pre">
+{`<dependency>
+ <groupId>org.apache.inlong</groupId>
+ <artifactId>sort-connector-hudi</artifactId>
+ <version>${siteVariables.inLongVersion}</version>
+</dependency>
+`}
+</code></pre>
+
+## 如何配置 Hudi 数据加载节点
+
+### SQL API 的使用
+
+使用 `Flink SQL Cli` :
+
+```sql
+CREATE TABLE `hudi_table_name` (
+ id STRING,
+ name STRING,
+ uv BIGINT,
+ pv BIGINT
+) WITH (
+ 'connector' = 'hudi-inlong',
+ 'path' =
'hdfs://127.0.0.1:90001/data/warehouse/hudi_db_name.db/hudi_table_name',
+ 'uri' = 'thrift://127.0.0.1:8091',
+ 'hoodie.database.name' = 'hudi_db_name',
+ 'hoodie.table.name' = 'hudi_table_name',
+ 'hoodie.datasource.write.recordkey.field' = 'id',
+ 'hoodie.bucket.index.hash.field' = 'id',
+ -- compaction
+ 'compaction.tasks' = '10',
+ 'compaction.async.enabled' = 'true',
+ 'compaction.schedule.enabled' = 'true',
+ 'compaction.max_memory' = '3096',
+ 'compaction.trigger.strategy' = 'num_or_time',
+ 'compaction.delta_commits' = '5',
+ 'compaction.max_memory' = '3096',
+ --
+ 'hoodie.keep.min.commits' = '1440',
+ 'hoodie.keep.max.commits' = '2880',
+ 'clean.async.enabled' = 'true',
+ --
+ 'write.operation' = 'upsert',
+ 'write.bucket_assign.tasks' = '60',
+ 'write.tasks' = '60',
+ 'write.log_block.size' = '128',
+ --
+ 'index.type' = 'BUCKET',
+ 'metadata.enabled' = 'false',
+ 'hoodie.bucket.index.num.buckets' = '20',
+ 'table.type' = 'MERGE_ON_READ',
+ 'clean.retain_commits' = '30',
+ 'hoodie.cleaner.policy' = 'KEEP_LATEST_COMMITS'
+);
+```
+
+### InLong Dashboard 方式
+
+#### 配置
+
+在创建数据流时,选择数据落地为 'Hive' 然后点击 'Add' 来配置 Hive 的相关信息。
+
+
+
+| 配置项 | 对应SQL DDL中的属性 | 备注
|
+| -------------- | --------------------------------------------- |
------------------------------------------------------- |
+| `DB名称` | `hoodie.database.name` | 库名称
|
+| `表名` | `hudi_table_name` | hudi表名
|
+| `是否创建资源` | - |
如果库表已经存在,且无需修改,则选【不创建】;<br/>否则请选择【创建】,由系统自动创建资源。 |
+| `Catalog URI` | `uri` | 元数据服务地址
|
+| `仓库路径` | - |
hudi表存储在HDFS中的位置<br/>在SQL DDL中path属性是将`仓库路径`与库、表名称拼接在一起 |
+| `属性` | - |
hudi表的DDL属性需带前缀'ddl.' |
+| `高级选项`>`数据一致性` | - |
Flink计算引擎的一致性语义: `EXACTLY_ONCE`或`AT_LEAST_ONCE` |
+| `分区字段` | `hoodie.datasource.write.partitionpath.field` | 分区字段
|
+| `主键字段` | `hoodie.datasource.write.recordkey.field` | 主键字段
|
+
+### InLong Manager Client 方式
+
+TODO: 未来版本支持
+
+## Hudi 加载节点参数信息
+
+| 选项 | 必填 | 类型 | 描述
|
+| ------------------------------------------- | --- | ------ |
-----------------------------------------------------------------------------------------------
|
+| connector | 必填 | String |
指定要使用的Connector,这里应该是'hudi-inlong'。
|
+| uri | 必填 | String | 用于配置单元同步的
Metastore uris
|
+| hoodie.database.name | 可选 | String |
将用于增量查询的数据库名称。如果不同数据库在增量查询时有相同的表名,我们可以设置它来限制特定数据库下的表名
|
+| hoodie.table.name | 可选 | String | 将用于向 Hive
注册的表名。 需要在运行中保持一致。
|
+| hoodie.datasource.write.recordkey.field | 必填 | String | 记录的主键字段。
用作“HoodieKey”的“recordKey”组件的值。 实际值将通过在字段值上调用 .toString() 来获得。
可以使用点符号指定嵌套字段,例如:`a.b.c` |
+| hoodie.datasource.write.partitionpath.field | 可选 | String | 分区路径字段。 在
HoodieKey 的 partitionPath 组件中使用的值。 通过调用 .toString() 获得的实际值
|
+| inlong.metric.labels | 可选 | String | 在long metric
label中,value的格式为groupId=xxgroup&streamId=xxstream&nodeId=xxnode。
|
+
+## 数据类型映射
+
+| Hive type | Flink SQL type |
+| ------------- | -------------- |
+| char(p) | CHAR(p) |
+| varchar(p) | VARCHAR(p) |
+| string | STRING |
+| boolean | BOOLEAN |
+| tinyint | TINYINT |
+| smallint | SMALLINT |
+| int | INT |
+| bigint | BIGINT |
+| float | FLOAT |
+| double | DOUBLE |
+| decimal(p, s) | DECIMAL(p, s) |
+| date | DATE |
+| timestamp(9) | TIMESTAMP |
+| bytes | BINARY |
+| array | LIST |
+| map | MAP |
+| row | STRUCT |
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/img/hudi.png
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/img/hudi.png
new file mode 100644
index 0000000000..b201eebcde
Binary files /dev/null and
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/img/hudi.png
differ
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/overview.md
index a3dc62f0a3..29071fc92c 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/overview.md
@@ -23,6 +23,7 @@ Load 节点列表是一组基于 <a href="https://flink.apache.org/">Apache Flin
| [SQLServer](sqlserver.md) |
[SQLServer](https://www.microsoft.com/sql-server): 2012, 2014, 2016, 2017, 2019
| JDBC
Driver: 7.2.2.jre8 |
| [HDFS](hdfs.md) |
[HDFS](https://hadoop.apache.org/): 2.x, 3.x
| None
|
| [Iceberg](iceberg.md) |
[Iceberg](https://iceberg.apache.org/): 0.13.1+
| None
|
+| [Hudi](hudi.md) | [Hudi](https://hudi.apache.org/): 0.12.x
| None |
## 支持的 Flink 版本列表
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/introduction.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/introduction.md
index d503fcdf94..f966b9fea0 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/introduction.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/introduction.md
@@ -56,7 +56,7 @@ Apache InLong 服务于数据采集到落地的整个生命周期,按数据的
- **inlong-agent**,数据采集服务,包括文件采集、DB 采集等。
- **inlong-dataproxy**,一个基于 Flume-ng 的 Proxy 组件,支持数据发送阻塞和落盘重发,拥有将接收到的数据转发到不同
MQ(消息队列)的能力。
- **inlong-tubemq**,腾讯自研的消息队列服务,专注于大数据场景下海量数据的高性能存储和传输,在海量实践和低成本方面有着良好的核心优势。
-- **inlong-sort**,对从不同的 MQ 消费到的数据进行 ETL 处理,然后汇聚并写入
Hive、ClickHouse、Hbase、Iceberg 等存储系统。
+- **inlong-sort**,对从不同的 MQ 消费到的数据进行 ETL 处理,然后汇聚并写入
Hive、ClickHouse、Hbase、Iceberg、Hudi 等存储系统。
- **inlong-manager**,提供完整的数据服务管控能力,包括元数据、任务流、权限,OpenAPI 等。
- **inlong-dashboard**,用于管理数据接入的前端页面,简化整个 InLong 管控平台的使用。
- **inlong-audit**,对 InLong 系统的 Agent、DataProxy、Sort 模块的入流量、出流量进行实时审计对账。
@@ -78,6 +78,7 @@ Apache InLong 服务于数据采集到落地的整个生命周期,按数据的
| Load Node | Auto Consumption | None | Standard
|
| | Hive | 1.x, 2.x, 3.x |
Lightweight, Standard |
| | Iceberg | 0.12.x |
Lightweight, Standard |
+| | Hudi | 0.12.x |
Lightweight, Standard |
| | ClickHouse | 20.7+ |
Lightweight, Standard |
| | Kafka | 2.x |
Lightweight, Standard |
| | HBase | 2.2.x |
Lightweight, Standard |
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/overview.md
index 6e9a1db722..c6755c00b6 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/overview.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/overview.md
@@ -30,4 +30,5 @@ InLong Sort 既支持和 Manager 一起配合使用,通过 Manager 进行系
| | PostgreSQL |
| | HDFS |
| | TDSQL Postgres |
+| | Hudi |
diff --git a/static/img/index-arch.svg b/static/img/index-arch.svg
index 2fdd2b2fcb..1fa2f34bb7 100644
--- a/static/img/index-arch.svg
+++ b/static/img/index-arch.svg
@@ -289,7 +289,7 @@
<use fill="#5494FF" xlink:href="#L"/>
</g>
<text font-family=".AppleSystemUIFont" font-size="16" fill="#FFF"
transform="translate(39 150)">
- <tspan x="44.5" y="29">Iceberg</tspan>
+ <tspan x="20" y="29">Iceberg / Hudi</tspan>
</text>
<g>
<g transform="translate(39 222)">
diff --git
a/versioned_docs/version-1.1.0/design_and_concept/how_to_write_plugin_sort.md
b/versioned_docs/version-1.1.0/design_and_concept/how_to_write_plugin_sort.md
index ca589372a4..bd5336f062 100644
---
a/versioned_docs/version-1.1.0/design_and_concept/how_to_write_plugin_sort.md
+++
b/versioned_docs/version-1.1.0/design_and_concept/how_to_write_plugin_sort.md
@@ -4,7 +4,7 @@ sidebar_position: 3
---
# Overview
-InLong-Sort is known as a real-time ETL system. Currently, supported sinks are
hive, kafka, clickhouse and iceberg.
+InLong-Sort is known as a real-time ETL system. Currently, supported sinks are
hive, kafka, clickhouse, hudi and iceberg.
This article introduces how to extend a new type of sink in InLong-Sort.
# Extend a new sink function