[GitHub] [inlong-website] dockerzhang commented on a diff in pull request #658: [INLONG-657][Doc] Support extract node of Apache Hudi

GitBox Mon, 19 Dec 2022 23:01:46 -0800


dockerzhang commented on code in PR #658:
URL: https://github.com/apache/inlong-website/pull/658#discussion_r1052967660



##########
docs/data_node/extract_node/hudi.md:
##########
@@ -0,0 +1,141 @@
+---
+title: Hudi
+sidebar_position: 11
+---
+
+import {siteVariables} from '../../version';
+
+## Overview
+
+[Apache Hudi](https://hudi.apache.org/cn/docs/overview/) (pronounced "hoodie") 
is a next-generation streaming data lake platform.
+Apache Hudi brings core warehouse and database functionality directly into the 
data lake.
+Hudi provides tables, transactions, efficient upserts/deletes, advanced 
indexing, streaming ingestion services, data clustering/compression 
optimizations, and concurrency while keeping data in an open source file format.
+
+## Supported Version
+
+| Load Node         | Version                                                  
        |
+| ----------------- | 
---------------------------------------------------------------- |
+| [Hudi](./hudi.md) | 
[Hudi](https://hudi.apache.org/cn/docs/quick-start-guide): 0.12+ |
+
+### Dependencies
+
+Introduce `sort-connector-hudi` through `Maven` to build your own project.
+Of course, you can also directly use the `jar` package provided by `INLONG`.
+([sort-connector-hudi](https://inlong.apache.org/download/))
+
+### Maven dependency
+
+<pre><code parentName="pre">
+{`<dependency>
+    <groupId>org.apache.inlong</groupId>
+    <artifactId>sort-connector-hudi</artifactId>
+    <version>${siteVariables.inLongVersion}</version>
+</dependency>
+`}
+</code></pre>
+
+## How to create a Hudi Extract Node
+
+### Usage for SQL API
+
+The example below shows how to create a Hudi Load Node with `Flink SQL Cli` :
+
+```sql
+CREATE TABLE `hudi_table_name` (
+  id STRING,
+  name STRING,
+  uv BIGINT,
+  pv BIGINT
+) WITH (
+    'connector' = 'hudi-inlong',
+    'path' = 
'hdfs://127.0.0.1:90001/data/warehouse/hudi_db_name.db/hudi_table_name',
+    'uri' = 'thrift://127.0.0.1:8091',
+    'hoodie.database.name' = 'hudi_db_name',
+    'hoodie.table.name' = 'hudi_table_name',
+    'read.streaming.check-interval'='1',
+    'read.streaming.enabled'='true',
+    'read.streaming.skip_compaction'='true',
+    'read.start-commit'='20221220121000',
+    --
+    'hoodie.bucket.index.hash.field' = 'id',
+    -- compaction
+    'compaction.tasks' = '10',
+    'compaction.async.enabled' = 'true',
+    'compaction.schedule.enabled' = 'true',
+    'compaction.max_memory' = '3096',
+    'compaction.trigger.strategy' = 'num_or_time',
+    'compaction.delta_commits' = '5',
+    'compaction.max_memory' = '3096',
+    --
+    'hoodie.keep.min.commits' = '1440',
+    'hoodie.keep.max.commits' = '2880',
+    'clean.async.enabled' = 'true',
+    --
+    'write.operation' = 'upsert',
+    'write.bucket_assign.tasks' = '60',
+    'write.tasks' = '60',
+    'write.log_block.size' = '128',
+    --
+    'index.type' = 'BUCKET',
+    'metadata.enabled' = 'false',
+    'hoodie.bucket.index.num.buckets' = '20',
+    'table.type' = 'MERGE_ON_READ',
+    'clean.retain_commits' = '30',
+    'hoodie.cleaner.policy' = 'KEEP_LATEST_COMMITS'
+);
+```
+
+### Usage for InLong Dashboard

Review Comment:
   ```suggestion
   ### Usage for Dashboard
   ```



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/hudi.md:
##########
@@ -0,0 +1,142 @@
+---
+title: Hudi
+sidebar_position: 12
+---
+
+import {siteVariables} from '../../version';
+
+## 概览
+
+[Apache Hudi](https://hudi.apache.org/cn/docs/overview/) 
(发音为"hoodie")是下一代流式数据湖平台。
+Apache Hudi 将核心仓库和数据库功能直接带到数据湖中。
+Hudi 提供表、事务、高效的 upserts/delete、高级索引、流摄入服务、数据聚类/压缩优化和并发，同时保持数据的开源文件格式。
+
+## 支持的版本
+
+| Load Node         | Version                                                  
        |
+| ----------------- | 
---------------------------------------------------------------- |
+| [Hudi](./hudi.md) | 
[Hudi](https://hudi.apache.org/cn/docs/quick-start-guide): 0.12+ |
+
+### 依赖
+
+通过 `Maven` 引入 `sort-connector-hudi` 构建自己的项目。
+当然，你也可以直接使用 `INLONG` 提供的 `jar` 
包。([sort-connector-hudi](https://inlong.apache.org/download))
+
+### Maven 依赖
+
+<pre><code parentName="pre">
+{`<dependency>
+    <groupId>org.apache.inlong</groupId>
+    <artifactId>sort-connector-hudi</artifactId>
+    <version>${siteVariables.inLongVersion}</version>
+</dependency>
+`}
+</code></pre>
+
+## 如何配置 Hudi 数据抽取节点
+
+### SQL API 的使用
+
+使用 `Flink SQL Cli` :
+
+```sql
+CREATE TABLE `hudi_table_name` (
+  id STRING,
+  name STRING,
+  uv BIGINT,
+  pv BIGINT
+) WITH (
+    'connector' = 'hudi-inlong',
+    'path' = 
'hdfs://127.0.0.1:90001/data/warehouse/hudi_db_name.db/hudi_table_name',
+    'uri' = 'thrift://127.0.0.1:8091',
+    'hoodie.database.name' = 'hudi_db_name',
+    'hoodie.table.name' = 'hudi_table_name',
+    'read.streaming.check-interval'='1',
+    'read.streaming.enabled'='true',
+    'read.streaming.skip_compaction'='true',
+    'read.start-commit'='20221220121000',
+    --
+    'hoodie.bucket.index.hash.field' = 'id',
+    -- compaction
+    'compaction.tasks' = '10',
+    'compaction.async.enabled' = 'true',
+    'compaction.schedule.enabled' = 'true',
+    'compaction.max_memory' = '3096',
+    'compaction.trigger.strategy' = 'num_or_time',
+    'compaction.delta_commits' = '5',
+    'compaction.max_memory' = '3096',
+    --
+    'hoodie.keep.min.commits' = '1440',
+    'hoodie.keep.max.commits' = '2880',
+    'clean.async.enabled' = 'true',
+    --
+    'write.operation' = 'upsert',
+    'write.bucket_assign.tasks' = '60',
+    'write.tasks' = '60',
+    'write.log_block.size' = '128',
+    --
+    'index.type' = 'BUCKET',
+    'metadata.enabled' = 'false',
+    'hoodie.bucket.index.num.buckets' = '20',
+    'table.type' = 'MERGE_ON_READ',
+    'clean.retain_commits' = '30',
+    'hoodie.cleaner.policy' = 'KEEP_LATEST_COMMITS'
+);
+```
+
+### InLong Dashboard 方式

Review Comment:
   ```suggestion
   ### Dashboard 方式
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [inlong-website] dockerzhang commented on a diff in pull request #658: [INLONG-657][Doc] Support extract node of Apache Hudi

Reply via email to