This is an automated email from the ASF dual-hosted git repository.

gosonzhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-inlong-website.git


The following commit(s) were added to refs/heads/master by this push:
     new fe415de  [INLONG-2029] add pulsar example document for the InLong 
(#230)
fe415de is described below

commit fe415de081192e661361ea42f05406021aeddce8
Author: dockerzhang <[email protected]>
AuthorDate: Mon Dec 20 10:03:18 2021 +0800

    [INLONG-2029] add pulsar example document for the InLong (#230)
    
    Co-authored-by: dockerzhang <[email protected]>
---
 docs/quick_start/hive_example.md                   |  10 +--
 docs/quick_start/img/pulsar-arch.png               | Bin 0 -> 19399 bytes
 docs/quick_start/img/pulsar-data.png               | Bin 0 -> 50694 bytes
 docs/quick_start/img/pulsar-group.png              | Bin 0 -> 79370 bytes
 docs/quick_start/img/pulsar-hive.png               | Bin 0 -> 50243 bytes
 docs/quick_start/img/pulsar-stream.png             | Bin 0 -> 46290 bytes
 docs/quick_start/img/pulsar-topic.png              | Bin 0 -> 33396 bytes
 docs/quick_start/pulsar_example.md                 |  90 +++++++++++++++++++++
 .../current/quick_start/hive_example.md            |  10 +--
 .../current/quick_start/img/pulsar-arch.png        | Bin 0 -> 9060 bytes
 .../current/quick_start/img/pulsar-data.png        | Bin 0 -> 22202 bytes
 .../current/quick_start/img/pulsar-group.png       | Bin 0 -> 26089 bytes
 .../current/quick_start/img/pulsar-hive.png        | Bin 0 -> 24108 bytes
 .../current/quick_start/img/pulsar-stream.png      | Bin 0 -> 22100 bytes
 .../current/quick_start/img/pulsar-topic.png       | Bin 0 -> 33396 bytes
 .../current/quick_start/pulsar_example.md          |  88 ++++++++++++++++++++
 16 files changed, 188 insertions(+), 10 deletions(-)

diff --git a/docs/quick_start/hive_example.md b/docs/quick_start/hive_example.md
index 7be6dbc..31daffb 100644
--- a/docs/quick_start/hive_example.md
+++ b/docs/quick_start/hive_example.md
@@ -5,17 +5,17 @@ sidebar_position: 2
 
 Here we use a simple example to help you experience InLong by Docker.
 
-## 1 Install Hive
+## Install Hive
 Hive is the necessary component. If you don't have Hive in your machine, we 
recommand using Docker to install it. Details can be found 
[here](https://github.com/big-data-europe/docker-hive).
 
 > Note that if you use Docker, you need to add a port mapping `8020:8020`, 
 > because it's the port of HDFS DefaultFS, and we need to use it later.
 
-## 2 Install InLong
+## Install InLong
 Before we begin, we need to install InLong. Here we provide two ways:
 1. Install InLong with Docker by according to the [instructions 
here](deployment/docker.md).(Recommanded)
 2. Install InLong binary according to the [instructions 
here](deployment/bare_metal.md).
 
-## 3 Create a data access
+## Create a data access
 After deployment, we first enter the "Data Access" interface, click "Create an 
Access" in the upper right corner to create a new date access, and fill in the 
data streams group information as shown in the figure below.
 
 ![Create Group](img/create-group.png)
@@ -38,12 +38,12 @@ Note that the target table does not need to be created in 
advance, as InLong Man
 
 Then we click the "Submit for Approval" button, the connection will be created 
successfully and enter the approval state.
 
-## 4 Approve the data access
+## Approve the data access
 Then we enter the "Approval Management" interface and click "My Approval" to 
approve the data access that we just applied for.
 
 At this point, the data access has been created successfully. We can see that 
the corresponding table has been created in Hive, and we can see that the 
corresponding topic has been created successfully in the management GUI of 
TubeMQ.
 
-## 5 Configure the agent
+## Configure the agent
 Here we use `docker exec` to enter the container of the agent and configure it.
 ```
 $ docker exec -it agent sh
diff --git a/docs/quick_start/img/pulsar-arch.png 
b/docs/quick_start/img/pulsar-arch.png
new file mode 100644
index 0000000..1afa12f
Binary files /dev/null and b/docs/quick_start/img/pulsar-arch.png differ
diff --git a/docs/quick_start/img/pulsar-data.png 
b/docs/quick_start/img/pulsar-data.png
new file mode 100644
index 0000000..b645deb
Binary files /dev/null and b/docs/quick_start/img/pulsar-data.png differ
diff --git a/docs/quick_start/img/pulsar-group.png 
b/docs/quick_start/img/pulsar-group.png
new file mode 100644
index 0000000..e50bd30
Binary files /dev/null and b/docs/quick_start/img/pulsar-group.png differ
diff --git a/docs/quick_start/img/pulsar-hive.png 
b/docs/quick_start/img/pulsar-hive.png
new file mode 100644
index 0000000..d070608
Binary files /dev/null and b/docs/quick_start/img/pulsar-hive.png differ
diff --git a/docs/quick_start/img/pulsar-stream.png 
b/docs/quick_start/img/pulsar-stream.png
new file mode 100644
index 0000000..7941829
Binary files /dev/null and b/docs/quick_start/img/pulsar-stream.png differ
diff --git a/docs/quick_start/img/pulsar-topic.png 
b/docs/quick_start/img/pulsar-topic.png
new file mode 100644
index 0000000..b892f65
Binary files /dev/null and b/docs/quick_start/img/pulsar-topic.png differ
diff --git a/docs/quick_start/pulsar_example.md 
b/docs/quick_start/pulsar_example.md
new file mode 100644
index 0000000..864c9c8
--- /dev/null
+++ b/docs/quick_start/pulsar_example.md
@@ -0,0 +1,90 @@
+---
+title: Pulsar Example
+sidebar_position: 2
+---
+
+Apache InLong has increased the ability to access data through Apache Pulsar, 
taking full advantage of Pulsar's technical advantages that are different from 
other MQ, and providing complete solutions for data access scenarios with 
higher data quality requirements such as finance and billing.
+In the following content, we will use a complete example to introduce Apache 
Pulsar to access data through Apache InLong.
+
+![Create Group](img/pulsar-arch.png)
+
+## Install Pulsar
+Please refer to [Official Installation 
Guidelines](https://pulsar.apache.org/docs/en/standalone/).
+
+## Install Hive
+Hive is the necessary component. If you don't have Hive in your machine, we 
recommand using Docker to install it. Details can be found 
[here](https://github.com/big-data-europe/docker-hive).
+
+> Note that if you use Docker, you need to add a port mapping `8020:8020`, 
because it's the port of HDFS DefaultFS, and we need to use it later.
+
+## Install InLong
+Before we begin, we need to install InLong. Here we provide two ways:
+1. Install InLong with Docker by according to the [instructions 
here](deployment/docker.md).(Recommanded)
+2. Install InLong binary according to the [instructions 
here](deployment/bare_metal.md).
+
+Unlike InLong TubeMQ, if you use Apache Pulsar, you need to configure Pulsar 
cluster information 
+in the Manager component installation. The format is as follows:
+```
+# Pulsar admin URL
+pulsar.adminUrl=http://127.0.0.1:8080,127.0.0.2:8080,127.0.0.3:8080
+# Pulsar broker address
+pulsar.serviceUrl=pulsar://127.0.0.1:6650,127.0.0.1:6650,127.0.0.1:6650
+# Default tenant of Pulsar
+pulsar.defaultTenant=public
+```
+
+## Create a data access
+### Configure data streams group information
+![](img/pulsar-group.png)
+When creating data access, the message middleware that the data stream group 
can use is Pulsar, 
+and other configuration items related to Pulsar include:
+- Queue module: Parallel or Serial, when selecting parallel, you can set the 
number of topic partitions
+- Write quorum: Number of copies to store for each message
+- Ack quorum: Number of guaranteed copies (acks to wait before write is 
complete)
+- retention time: retention time for the consumed message
+- ttl: The default Time to Live for message
+- retention size: retention size for the consumed message
+
+### Configure data stream
+![](img/pulsar-stream.png)
+When configuring the message source, the file path in the file data source can 
be referred to 
[file-agent-configuration](https://inlong.apache.org/docs/next/modules/agent/file#file-agent-configuration).
+
+### Configure data information
+![](img/pulsar-data.png)
+
+### Configure Hive cluster
+Save Hive cluster information, click "Ok" to submit.
+![](img/pulsar-hive.png)
+
+## Data access Approval
+Enter **Approval** page, click **My Approval**, abd approve the data access 
application. After the approval is over, 
+the topics and subscriptions required for the data stream will be created in 
the Pulsar cluster synchronously.
+We can use the command-line tool in the Pulsar cluster to check whether the 
topic is created successfully:
+![](img/pulsar-topic.png)
+
+## Configure File Agent
+When configuring the file agent, you must create the file in the directory 
specified when creating the data access:
+```
+touch /data/test_file.txt;
+```
+
+Write data to the file according to the data source format when creating the 
data stream:
+```
+echo -e "1|test\n2|test\n" >> /data/test_file.txt
+```
+
+## Data Check
+Finally, we log in to the Hive cluster and use Hive SQL commands to check 
+whether data is successfully inserted in the `test_stream` table.
+
+## Troubleshooting
+If data is not correctly written to the Hive cluster, you can check whether 
the `DataProxy` and `Sort` related information are synchronized:
+- Check whether the topic information corresponding to the data stream is 
correctly written in the `conf/topics.properties` folder of `InLong DataProxy`:
+```
+b_test_group/test_stream=persistent://public/b_test_group/test_stream
+```
+
+- Check whether the configuration information of the data stream is 
successfully pushed in 
+- the ZooKeeper monitored by `InLong Sort`:
+```
+get /inlong_hive/dataflows/{{sink_id}}
+```
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/hive_example.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/hive_example.md
index f3f8c93..b56f516 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/hive_example.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/hive_example.md
@@ -6,18 +6,18 @@ sidebar_position: 2
 本节用一个简单的示例,帮助您使用 Docker 快速体验 InLong 的完整流程。
 
 
-## 1 安装 Hive
+## 安装 Hive
 Hive 是运行的必备组件。如果您的机器上没有 Hive,这里推荐使用 Docker 进行快速安装,详情可见 
[这里](https://github.com/big-data-europe/docker-hive)。
 
 > 注意,如果使用以上 Docker 镜像的话,我们需要在 namenode 中添加一个端口映射 `8020:8020`,因为它是 HDFS 
 > DefaultFS 的端口,后面在配置 Hive 时需要用到。
 
-## 2 安装 InLong
+## 安装 InLong
 在开始之前,我们需要安装 InLong 的全部组件,这里提供两种方式:
 1. 按照 [这里的说明](deployment/docker.md),使用 Docker 进行快速部署。(推荐)
 2. 按照 [这里的说明](deployment/bare_metal.md),使用二进制包依次安装各组件。
 
 
-## 3 新建接入
+## 新建接入
 部署完毕后,首先我们进入 “数据接入” 界面,点击右上角的 “新建接入”,新建一条接入,按下图所示填入数据流 Group 信息
 
 ![Create Group](img/create-group.png)
@@ -40,12 +40,12 @@ Hive 是运行的必备组件。如果您的机器上没有 Hive,这里推荐
 
 然后点击“提交审批”按钮,该接入就会创建成功,进入审批状态。
 
-## 4 审批接入
+## 审批接入
 进入“审批管理”界面,点击“我的审批”,将刚刚申请的接入通过。
 
 到此接入就已经创建完毕了,我们可以在 Hive 中看到相应的表已经被创建,并且在 TubeMQ 的管理界面中可以看到相应的 topic 已经创建成功。
 
-## 5 配置 agent
+## 配置 agent
 然后我们使用 docker 进入 agent 容器内,创建相应的 agent 配置。
 ```
 $ docker exec -it agent sh
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-arch.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-arch.png
new file mode 100644
index 0000000..a54d1e8
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-arch.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-data.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-data.png
new file mode 100644
index 0000000..256ce44
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-data.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-group.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-group.png
new file mode 100644
index 0000000..fd53a19
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-group.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-hive.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-hive.png
new file mode 100644
index 0000000..3651a07
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-hive.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-stream.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-stream.png
new file mode 100644
index 0000000..fa31cc3
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-stream.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-topic.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-topic.png
new file mode 100644
index 0000000..b892f65
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-topic.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/pulsar_example.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/pulsar_example.md
new file mode 100644
index 0000000..6bc3b24
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/pulsar_example.md
@@ -0,0 +1,88 @@
+---
+title: 使用 Pulsar 示例
+sidebar_position: 2
+---
+
+Apache InLong 增加了通过 Apache Pulsar 接入数据的能力,充分利用了 Pulsar 不同于其它 MQ 
的技术优势,为金融、计费等数据质量要求更高的数据接入场景,提供完整的解决方案。
+在下面的内容中,我们将通过一个完整的示例介绍如何通过 Apache InLong 使用 Apache Pulsar 接入数据。
+
+![Create Group](img/pulsar-arch.png)
+
+## 安装 Pulsar
+部署Apache Pulsar 集群可以参考[官方安装指引](https://pulsar.apache.org/docs/en/standalone/).
+
+## 安装 Hive
+Hive 是运行的必备组件。如果您的机器上没有 Hive,这里推荐使用 Docker 进行快速安装,详情可见 
[这里](https://github.com/big-data-europe/docker-hive)。
+
+> 注意,如果使用以上 Docker 镜像的话,我们需要在 namenode 中添加一个端口映射 `8020:8020`,因为它是 HDFS 
DefaultFS 的端口,后面在配置 Hive 时需要用到。
+
+## 安装 InLong
+在开始之前,我们需要安装 InLong 的全部组件,这里提供两种方式:
+1. 按照 [这里的说明](deployment/docker.md),使用 Docker 进行快速部署。(推荐)
+2. 按照 [这里的说明](deployment/bare_metal.md),使用二进制包依次安装各组件。
+
+区别于 InLong TubeMQ,如果使用 Apache Pulsar,需要在 Manager 组件安装中配置 Pulsar 集群信息,格式如下:
+```
+# Pulsar admin URL
+pulsar.adminUrl=http://127.0.0.1:8080,127.0.0.2:8080,127.0.0.3:8080
+# Pulsar broker address
+pulsar.serviceUrl=pulsar://127.0.0.1:6650,127.0.0.1:6650,127.0.0.1:6650
+# Default tenant of Pulsar
+pulsar.defaultTenant=public
+```
+
+## 创建数据接入
+### 配置数据流Group 信息
+![](img/pulsar-group.png)
+在创建数据接入时,数据流 Group 可选用的消息中间件选择 Pulsar,其它跟 Pulsar 相关的配置项还包括:
+- Queue module:队列模型,并行或者顺序,选择并行时可设置 Topic 的分区数,顺序则为一个分区;
+- Write quorum:消息写入的副本数
+- Ack quorum:确认写入 Bookies 的数量
+- retention time:已被 consumer 确认的消息被保存的时间
+- ttl:未被确认的消息的过期时间
+- retention size:已被 consumer 确认的消息被保存的大小
+
+### 配置数据流
+![](img/pulsar-stream.png)
+配置消息来源时,文件数据源中的文件路径,可参照 inlong-agent 中[File 
Agent的详细指引](https://inlong.apache.org/docs/next/modules/agent/file#file-agent-configuration)。
+
+### 配置数据格式
+![](img/pulsar-data.png)
+
+### 配置 Hive 集群
+保存 Hive 集群信息,点击“确定”。
+![](img/pulsar-hive.png)
+
+## 数据接入审批
+进入**审批管理**页面,点击**我的审批**,审批上面提交的接入申请,审批结束后会在 Pulsar 集群同步创建数据流需要的 Topic 和订阅。
+我们可以在 Pulsar 集群使用命令行工具检查 Topic 是否创建成功:
+![](img/pulsar-topic.png)
+
+## 配置文件 Agent
+在配置文件 Agent 时,需要根据数据接入创建时指定的目录下创建文件:
+```
+touch /data/test_file.txt;
+```
+
+按照创建数据流时的数据源格式,向文件中写入数据(可以按格式写入更多数据):
+```
+echo -e "1|test\n2|test\n" >> /data/test_file.txt
+```
+
+## 数据落地检查
+
+最后,我们登入 Hive 集群,通过 Hive 的 SQL 命令查看 `test_stream` 表中是否成功插入了数据。
+
+## 问题排查
+如果出现数据未正确写入 Hive 集群,可以检查 `DataProxy` 和 `Sort` 相关信息是否同步:
+- 检查 `InLong DataProxy` 的 `conf/topics.properties` 文件夹中是否正确写入该数据流对应的Topic 信息:
+```
+b_test_group/test_stream=persistent://public/b_test_group/test_stream
+```
+
+- 检查 InLong Sort 监听的 ZooKeeper 中是否成功推送了数据流的配置信息:
+```
+get /inlong_hive/dataflows/{{sink_id}}
+```
+
+

Reply via email to