This is an automated email from the ASF dual-hosted git repository.
gosonzhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-inlong-website.git
The following commit(s) were added to refs/heads/master by this push:
new b5e9420 [INLONG-1814] Show document file subdirectories and change
the document directory level (#190)
b5e9420 is described below
commit b5e94203a6f534bf49e9a4e2d59bc366c80b7dd1
Author: lizhwang <[email protected]>
AuthorDate: Sat Nov 20 19:02:24 2021 +0800
[INLONG-1814] Show document file subdirectories and change the document
directory level (#190)
---
docs/modules/agent/architecture.md | 12 ++---
docs/modules/agent/quick_start.md | 18 +++----
docs/modules/dataproxy-sdk/architecture.md | 16 +++---
docs/modules/dataproxy/architecture.md | 8 +--
docs/modules/dataproxy/quick_start.md | 14 ++---
docs/modules/manager/architecture.md | 10 ++--
docs/modules/manager/quick_start.md | 12 ++---
docs/modules/sort/introduction.md | 18 +++----
docs/modules/sort/protocol_introduction.md | 4 +-
docs/modules/sort/quick_start.md | 10 ++--
docs/modules/tubemq/architecture.md | 4 +-
docs/modules/tubemq/tubemq-manager/quick_start.md | 10 ++--
docs/modules/website/quick_start.md | 12 ++---
docs/user_guide/example.md | 10 ++--
docs/user_guide/quick_start.md | 26 ++++-----
docs/user_guide/user_manual.md | 62 +++++++++++-----------
.../current/modules/agent/architecture.md | 6 +--
.../current/modules/agent/quick_start.md | 18 +++----
.../current/modules/dataproxy-sdk/architecture.md | 16 +++---
.../current/modules/dataproxy/architecture.md | 8 +--
.../current/modules/dataproxy/quick_start.md | 14 ++---
.../current/modules/manager/architecture.md | 10 ++--
.../current/modules/manager/quick_start.md | 12 ++---
.../current/modules/sort/introduction.md | 18 +++----
.../current/modules/sort/protocol_introduction.md | 4 +-
.../current/modules/sort/quick_start.md | 10 ++--
.../current/modules/tubemq/clients_java.md | 10 ++--
.../current/modules/tubemq/quick_start.md | 6 +--
.../modules/tubemq/tubemq-manager/quick_start.md | 10 ++--
.../modules/tubemq/tubemq_perf_test_vs_Kafka_cn.md | 16 +++---
.../current/modules/website/quick_start.md | 12 ++---
.../current/user_guide/example.md | 10 ++--
.../current/user_guide/quick_start.md | 26 ++++-----
.../current/user_guide/user_manual.md | 62 +++++++++++-----------
.../modules/dataproxy/architecture.md | 6 +--
35 files changed, 260 insertions(+), 260 deletions(-)
diff --git a/docs/modules/agent/architecture.md
b/docs/modules/agent/architecture.md
index 75e58b7..b369425 100644
--- a/docs/modules/agent/architecture.md
+++ b/docs/modules/agent/architecture.md
@@ -2,19 +2,19 @@
title: Architecture
---
-## 1. Overview of InLong-Agent
+## 1 Overview of InLong-Agent
InLong-Agent is a collection tool that supports multiple types of data
sources, and is committed to achieving stable and efficient data collection
functions between multiple heterogeneous data sources including file, sql,
Binlog, metrics, etc.
-### The brief architecture diagram is as follows:
+### 1.1 The brief architecture diagram is as follows:

-### design concept
+### 1.2 design concept
In order to solve the problem of data source diversity, InLong-agent abstracts
multiple data sources into a unified source concept, and abstracts sinks to
write data. When you need to access a new data source, you only need to
configure the format and reading parameters of the data source to achieve
efficient reading.
-### Current status of use
+### 1.3 Current status of use
InLong-Agent is widely used within the Tencent Group, undertaking most of the
data collection business, and the amount of online data reaches tens of
billions.
-## 2. InLong-Agent architecture
+## 2 InLong-Agent architecture
The InLong Agent task is used as a data acquisition framework, constructed
with a channel + plug-in architecture. Read and write the data source into a
reader/writer plug-in, and then into the entire framework.
+ Reader: Reader is the data collection module, responsible for collecting
data from the data source and sending the data to the channel.
@@ -22,7 +22,7 @@ The InLong Agent task is used as a data acquisition
framework, constructed with
+ Channel: The channel used to connect the reader and writer, and as the data
transmission channel of the connection, which realizes the function of data
reading and monitoring
-## 3. Different kinds of agent
+## 3 Different kinds of agent
### 3.1 file agent
File collection includes the following functions:
diff --git a/docs/modules/agent/quick_start.md
b/docs/modules/agent/quick_start.md
index 7e567c2..04ef7c2 100644
--- a/docs/modules/agent/quick_start.md
+++ b/docs/modules/agent/quick_start.md
@@ -2,7 +2,7 @@
title: Build && Deployment
---
-## 1、Configuration
+## 1 Configuration
```
cd inlong-agent
```
@@ -10,7 +10,7 @@ cd inlong-agent
The agent supports two modes of operation: local operation and online operation
-### Agent configuration
+### 1.1 Agent configuration
Online operation needs to pull the configuration from inlong-manager, the
configuration conf/agent.properties is as follows:
```ini
@@ -20,7 +20,7 @@ agent.manager.vip.http.host=manager web host
agent.manager.vip.http.port=manager web port
```
-## 2、run
+## 2 run
After decompression, run the following command
```bash
@@ -28,9 +28,9 @@ sh agent.sh start
```
-## 3、Add job configuration in real time
+## 3 Add job configuration in real time
-#### 3.1 agent.properties Modify the following two places
+### 3.1 agent.properties Modify the following two places
```ini
# whether enable http service
agent.http.enable=true
@@ -38,7 +38,7 @@ agent.http.enable=true
agent.http.port=Available ports
```
-#### 3.2 Execute the following command
+### 3.2 Execute the following command
```bash
curl --location --request POST 'http://localhost:8008/config/job' \
--header 'Content-Type: application/json' \
@@ -78,7 +78,7 @@ agent.http.port=Available ports
- proxy.streamId: The streamId type used when writing proxy, streamId is
the data flow id showed on data flow window in inlong-manager
-## 4、eg for directory config
+## 4 eg for directory config
E.g:
/data/inlong-agent/test.log //Represents reading the new file test.log in
the inlong-agent folder
@@ -87,7 +87,7 @@ agent.http.port=Available ports
/data/inlong-agent/^\\d+(\\.\\d+)? // Start with one or more digits,
followed by. or end with one. or more digits (? stands for optional, can match
Examples: "5", "1.5" and "2.21"
-## 5. Support to get data time from file name
+## 5 Support to get data time from file name
Agent supports obtaining the time from the file name as the production
time of the data. The configuration instructions are as follows:
/data/inlong-agent/***YYYYMMDDHH***
@@ -143,7 +143,7 @@ curl --location --request
POST'http://localhost:8008/config/job' \
}'
```
-## 6. Support time offset reading
+## 6 Support time offset reading
After the configuration is read by time, if you want to read data at other
times than the current time, you can configure the time offset to complete
Configure the job attribute name as job.timeOffset, the value is number +
time dimension, time dimension includes day and hour
diff --git a/docs/modules/dataproxy-sdk/architecture.md
b/docs/modules/dataproxy-sdk/architecture.md
index 591163c..42bc4bb 100644
--- a/docs/modules/dataproxy-sdk/architecture.md
+++ b/docs/modules/dataproxy-sdk/architecture.md
@@ -1,16 +1,16 @@
---
title: Architecture
---
-# 1、intro
+## 1 intro
When the business uses the message access method, the business generally only
needs to format the data in a proxy-recognizable format (such as six-segment
protocol, digital protocol, etc.)
After group packet transmission, data can be connected to inlong. But in order
to ensure data reliability, load balancing, and dynamic update of the proxy
list and other security features
The user program needs to consider more and ultimately leads to the program
being too cumbersome and bloated.
The original intention of API design is to simplify user access and assume
some reliability-related logic. After the user integrates the API in the
service delivery program, the data can be sent to the proxy without worrying
about the grouping format, load balancing and other logic.
-# 2、functions
+## 2 functions
-## 2.1 overall functions
+### 2.1 overall functions
| function | description |
| ---- | ---- |
@@ -22,9 +22,9 @@ The original intention of API design is to simplify user
access and assume some
| proxy list persistence (new)| Persist the proxy list according to the
business group id to prevent the configuration center from failing to send data
when the program starts
-## 2.2 Data transmission function description
+### 2.2 Data transmission function description
-### Synchronous batch function
+#### Synchronous batch function
public SendResult sendMessage(List<byte[]> bodyList, String groupId,
String streamId, long dt, long timeout, TimeUnit timeUnit)
@@ -32,7 +32,7 @@ The original intention of API design is to simplify user
access and assume some
bodyListIt is a collection of multiple pieces of data that users need to
send. The total length is recommended to be less than 512k. groupId represents
the service id, and streamId represents the interface id. dt represents the
time stamp of the data, accurate to the millisecond level. It can also be set
to 0 directly, and the api will get the current time as its timestamp in the
background. timeout & timeUnit: These two parameters are used to set the
timeout time for sending data, a [...]
-### Synchronize a single function
+#### Synchronize a single function
public SendResult sendMessage(byte[] body, String groupId, String
streamId, long dt, long timeout, TimeUnit timeUnit)
@@ -41,7 +41,7 @@ The original intention of API design is to simplify user
access and assume some
body is the content of a single piece of data that the user wants to send,
and the meaning of the remaining parameters is basically the same as the batch
sending interface.
-### Asynchronous batch function
+#### Asynchronous batch function
public void asyncSendMessage(SendMessageCallback callback, List<byte[]>
bodyList, String groupId, String streamId, long dt, long timeout,TimeUnit
timeUnit)
@@ -50,7 +50,7 @@ The original intention of API design is to simplify user
access and assume some
SendMessageCallback is a callback for processing messages. The bodyList is
a collection of multiple pieces of data that users need to send. The total
length of multiple pieces of data is recommended to be less than 512k. groupId
is the service id, and streamId is the interface id. dt represents the time
stamp of the data, accurate to the millisecond level. It can also be set to 0
directly, and the api will get the current time as its timestamp in the
background. timeout and timeUnit [...]
-### Asynchronous single function
+#### Asynchronous single function
public void asyncSendMessage(SendMessageCallback callback, byte[] body,
String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
diff --git a/docs/modules/dataproxy/architecture.md
b/docs/modules/dataproxy/architecture.md
index de2a89a..a897a7a 100644
--- a/docs/modules/dataproxy/architecture.md
+++ b/docs/modules/dataproxy/architecture.md
@@ -1,14 +1,14 @@
---
title: Architecture
---
-# 1、intro
+## 1 intro
Inlong-dataProxy belongs to the inlong proxy layer and is used for data
collection, reception and forwarding. Through format conversion, the data is
converted into TDMsg1 format that can be cached and processed by the cache layer
InLong-dataProxy acts as a bridge from the InLong collection end to the
InLong buffer end. Dataproxy pulls the relationship between the business group
id and the corresponding topic name from the manager module, and internally
manages the producers of multiple topics
The overall architecture of inlong-dataproxy is based on Apache Flume. On
the basis of this project, inlong-bus expands the source layer and sink layer,
and optimizes disaster tolerance forwarding, which improves the stability of
the system.
-# 2、architecture
+## 2 architecture

@@ -16,7 +16,7 @@ title: Architecture
2. The channel layer has a selector, which is used to choose which type
of channel to go. If the memory is eventually full, the data will be processed.
3. The data of the channel layer will be forwarded through the sink
layer. The main purpose here is to convert the data to the TDMsg1 format and
push it to the cache layer (tube is more commonly used here)
-# 3、DataProxy support configuration instructions
+## 3 DataProxy support configuration instructions
DataProxy supports configurable source-channel-sink, and the configuration
method is the same as the configuration file structure of flume:
@@ -158,7 +158,7 @@ agent1.sinks.meta-sink-more1.max-survived-size = 3000000
Maximum number of caches
```
-# 4、Monitor metrics configuration instructions
+## 4 Monitor metrics configuration instructions
DataProxy provide monitor indicator based on JMX, user can implement the
code that read the metrics and report to user-defined monitor system.
Source-module and Sink-module can add monitor metric class that is the
subclass of org.apache.inlong.commons.config.metrics.MetricItemSet, and
register it to MBeanServer. User-defined plugin can get module metric with JMX,
and report metric data to different monitor system.
diff --git a/docs/modules/dataproxy/quick_start.md
b/docs/modules/dataproxy/quick_start.md
index e8bcc69..55b9398 100644
--- a/docs/modules/dataproxy/quick_start.md
+++ b/docs/modules/dataproxy/quick_start.md
@@ -1,11 +1,11 @@
---
title: Build && Deployment
---
-## Deploy DataProxy
+## 1 Deploy DataProxy
All deploying files at `inlong-dataproxy` directory.
-### config TubeMQ master
+### 1.1 config TubeMQ master
`tubemq_master_list` is the rpc address of TubeMQ Master.
```
@@ -14,13 +14,13 @@ $ sed -i 's/TUBE_LIST/tubemq_master_list/g' conf/flume.conf
notice that conf/flume.conf FLUME_HOME is proxy the directory for proxy inner
data
-### Environmental preparation
+### 1.2 Environmental preparation
```
sh prepare_env.sh
```
-### config manager web url
+### 1.3 config manager web url
configuration file: `conf/common.properties`:
```
@@ -28,19 +28,19 @@ configuration file: `conf/common.properties`:
manager_hosts=ip:port
```
-## run
+## 2 run
```
sh bin/start.sh
```
-## check
+## 3 check
```
telnet 127.0.0.1 46801
```
-## Add DataProxy configuration to InLong-Manager
+## 4 Add DataProxy configuration to InLong-Manager
After installing the DataProxy, you need to insert the IP and port of the
DataProxy service is located into the backend database of InLong-Manager.
diff --git a/docs/modules/manager/architecture.md
b/docs/modules/manager/architecture.md
index 84256a9..d82ff16 100644
--- a/docs/modules/manager/architecture.md
+++ b/docs/modules/manager/architecture.md
@@ -2,19 +2,19 @@
title: Architecture
---
-## Introduction to Apache InLong Manager
+## 1 Introduction to Apache InLong Manager
+ Target positioning: Apache inlong is positioned as a one-stop data access
solution, providing complete coverage of big data access scenarios from data
collection, transmission, sorting, and technical capabilities.
+ Platform value: Users can complete task configuration, management, and
indicator monitoring through the platform's built-in management and
configuration platform. At the same time, the platform provides SPI extension
points in the main links of the process to implement custom logic as needed.
Ensure stable and efficient functions while lowering the threshold for platform
use.
+ Apache InLong Manager is the user-oriented unified UI of the entire data
access platform. After the user logs in, it will provide different function
permissions and data permissions according to the corresponding role. The page
provides maintenance portals for the platform's basic clusters (such as mq,
sorting), and you can view basic maintenance information and capacity planning
adjustments at any time. At the same time, business users can complete the
creation, modification and maint [...]
-## Architecture
+## 2 Architecture

-##Module division of labor
+## 3 Module division of labor
| Module | Responsibilities |
| :----| :---- |
@@ -24,9 +24,9 @@ title: Architecture
| manager-web | Front-end interactive response interface |
| manager-workflow-engine | Workflow Engine |
-## use process
+## 4 use process

-## data model
+## 5 data model

\ No newline at end of file
diff --git a/docs/modules/manager/quick_start.md
b/docs/modules/manager/quick_start.md
index 6f61754..3d32971 100644
--- a/docs/modules/manager/quick_start.md
+++ b/docs/modules/manager/quick_start.md
@@ -2,7 +2,7 @@
title: Build && Deployment
---
-# 1. Environmental preparation
+## 1 Environmental preparation
- Install and start MySQL 5.7+, copy the `doc/sql/apache_inlong_manager.sql`
file in the inlong-manager module to the
server where the MySQL database is located (for example, copy to `/data/`
directory), load this file through the
following command to complete the initialization of the table structure and
basic data:
@@ -25,15 +25,15 @@ title: Build && Deployment
to [Compile and deploy TubeMQ
Manager](https://inlong.apache.org/zh-cn/docs/modules/tubemq/tubemq-manager/quick_start.html)
, install and start TubeManager.
-# 2. Deploy and start manager-web
+## 2 Deploy and start manager-web
**manager-web is a background service that interacts with the front-end page.**
-## 2.1 Prepare installation files
+### 2.1 Prepare installation files
All installation files at `inlong-manager-web` directory.
-## 2.2 Modify configuration
+### 2.2 Modify configuration
Go to the decompressed `inlong-manager-web` directory and modify the
`conf/application.properties` file:
@@ -74,7 +74,7 @@ The dev configuration is specified above, then modify the
`conf/application-dev.
sort.appName=inlong_app
```
-## 2.3 Start the service
+### 2.3 Start the service
Enter the decompressed directory, execute `sh bin/startup.sh` to start the
service, and check the
log `tailf log/manager-web.log`. If a log similar to the following appears,
the service has started successfully:
@@ -83,7 +83,7 @@ log `tailf log/manager-web.log`. If a log similar to the
following appears, the
Started InLongWebApplication in 6.795 seconds (JVM running for 7.565)
```
-# 3. Service access verification
+## 3 Service access verification
Verify the manager-web service:
diff --git a/docs/modules/sort/introduction.md
b/docs/modules/sort/introduction.md
index a215522..7e9f6b0 100644
--- a/docs/modules/sort/introduction.md
+++ b/docs/modules/sort/introduction.md
@@ -7,31 +7,31 @@ Inlong-sort is used to extract data from different source
systems, then transfor
Inlong-sort is simply an Flink application, and relys on Inlong-manager to
manage meta data(such as the source informations and storage informations)
# features
-## multi-tenancy
+## 1 multi-tenancy
Inlong-sort is an multi-tenancy system, which means you can extract data from
different sources(these sources must be of the same source type) and load data
into different sinks(these sinks must be of the same storage type).
e.g. you can extract data form different topics of inlong-tubemq and the load
them to different hive clusters.
-## change meta data without restart
+## 2 change meta data without restart
Inlong-sort uses zookeeper to manage its meta data, every time you change meta
data on zk, inlong-sort application will be informed immediately.
e.g if you want to change the schema of your data, just change the meta data
on zk without restart your inlong-sort application.
-# supported sources
+## 3 supported sources
- inlong-tubemq
- pulsar
-# supported storages
+## 4 supported storages
- clickhouse
- hive (Currently we just support parquet file format)
-# limitations
+## 5 limitations
Currently, we just support extracting specified fields in the stage of
**Transform**.
-# future plans
-## More kinds of source systems
+## 6 future plans
+### 6.1 More kinds of source systems
kafka and etc
-## More kinds of storage systems
+### 6.2 More kinds of storage systems
Hbase, Elastic Search, and etc
-## More kinds of file format in hive sink
+### 6.3 More kinds of file format in hive sink
sequence file, orc
\ No newline at end of file
diff --git a/docs/modules/sort/protocol_introduction.md
b/docs/modules/sort/protocol_introduction.md
index a04538f..90eeb8d 100644
--- a/docs/modules/sort/protocol_introduction.md
+++ b/docs/modules/sort/protocol_introduction.md
@@ -7,7 +7,7 @@ Currently the metadata management of inlong-sort relies on
inlong-manager.
Metadata interaction between inlong-sort and inlong-manager is performed via
ZK.
-# Zookeeper's path structure
+## 1 Zookeeper's path structure

@@ -20,6 +20,6 @@ A path at the top of the figure indicates which dataflow are
running in a cluste
The path below is used to store the details of the dataflow.
-# Protocol
+## 2 Protocol
Please reference
`org.apache.inlong.sort.protocol.DataFlowInfo`
\ No newline at end of file
diff --git a/docs/modules/sort/quick_start.md b/docs/modules/sort/quick_start.md
index d2e46eb..8823e05 100644
--- a/docs/modules/sort/quick_start.md
+++ b/docs/modules/sort/quick_start.md
@@ -2,7 +2,7 @@
title: Build && Deployment
---
-## Set up flink environment
+## 1 Set up flink environment
Currently inlong-sort is based on flink, before you run an inlong-sort
application,
you need to set up flink environment.
@@ -12,10 +12,10 @@ Currently, inlong-sort relys on flink-1.9.3. Chose
`flink-1.9.3-bin-scala_2.11.t
Once your flink environment is set up, you can visit web ui of flink, whose
address is stored in `/${your_flink_path}/conf/masters`.
-## Prepare installation files
+## 2 Prepare installation files
All installation files at `inlong-sort` directory.
-## Starting an inlong-sort application
+## 3 Starting an inlong-sort application
Now you can submit job to flink with the jar compiled.
<a
href="https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/deployment/yarn_setup.html#submit-job-to-flink"
target="_blank">how to submit job to flink</a>
@@ -30,7 +30,7 @@ Notice:
- `inlong-sort-core-1.0-SNAPSHOT.jar` is the compiled jar
-## Necessary configurations
+## 4 Necessary configurations
- `--cluster-id ` which is used to represent a specified inlong-sort
application
- `--zookeeper.quorum` zk quorum
- `--zookeeper.path.root` zk root path
@@ -45,7 +45,7 @@ Configurations above are necessary, you can see full
configurations in
`--cluster-id my_application --zookeeper.quorum 192.127.0.1:2181
--zookeeper.path.root /zk_root --source.type tubemq --sink.type hive`
-## All configurations
+## 5 All configurations
| name | necessary | default value |description |
| ------------ | ------------ | ------------ | ------------ |
|cluster-id | Y | NA | used to represent a specified inlong-sort
application |
diff --git a/docs/modules/tubemq/architecture.md
b/docs/modules/tubemq/architecture.md
index d81a6ee..cd4ad51 100644
--- a/docs/modules/tubemq/architecture.md
+++ b/docs/modules/tubemq/architecture.md
@@ -2,7 +2,7 @@
title: Architecture
---
-## 1. TubeMQ Architecture:
+## 1 TubeMQ Architecture:
After years of evolution, the TubeMQ cluster is divided into the following 5
parts:

@@ -30,7 +30,7 @@ After years of evolution, the TubeMQ cluster is divided into
the following 5 par
- **ZooKeeper:** Responsible for the ZooKeeper part of the offset storage.
This part of the function has been weakened to only the persistent storage of
the offset. Considering the next multi-node copy function, this module is
temporarily reserved;
-## 2. Broker File Storage Scheme Improvement:
+## 2 Broker File Storage Scheme Improvement:
Systems that use disks as data persistence media are faced with various system
performance problems caused by disk problems. The TubeMQ system is no
exception, the performance improvement is largely to solve the problem of how
to read, write and store message data. In this regard TubeMQ has made many
improvements: storage instances is as the smallest Topic data management unit;
each storage instance includes a file storage block and a memory cache block;
each Topic can be assigned multip [...]
1. **File storage block:** The disk storage solution of TubeMQ is similar to
Kafka, but it is not the same, as shown in the following figure: each file
storage block is composed of an index file and a data file; the partiton is a
logical partition in the data file; each Topic maintains and manages the file
storage block separately, the related mechanisms include the aging cycle, the
number of partitions, whether it is readable and writable, etc.
diff --git a/docs/modules/tubemq/tubemq-manager/quick_start.md
b/docs/modules/tubemq/tubemq-manager/quick_start.md
index cf27a7c..bebcefb 100644
--- a/docs/modules/tubemq/tubemq-manager/quick_start.md
+++ b/docs/modules/tubemq/tubemq-manager/quick_start.md
@@ -1,7 +1,7 @@
-## Deploy TubeMQ Manager
+## 1 Deploy TubeMQ Manager
All deploying files at `inlong-tubemq-manager` directory.
-### configuration
+### 1.1 configuration
- create `tubemanager` and account in MySQL.
- Add mysql information in conf/application.properties:
@@ -12,13 +12,13 @@ spring.datasource.username=mysql_username
spring.datasource.password=mysql_password
```
-### start service
+### 1.2 start service
``` bash
$ bin/start-manager.sh
```
-### register TubeMQ cluster
+### 1.3 register TubeMQ cluster
vim bin/init-tube-cluster.sh
@@ -40,7 +40,7 @@ sh bin/init-tube-cluster.sh
this will create a cluster with id = 1, note that this operation should not be
executed repeatedly.
-### Appendix: Other Operation interface
+### 1.4 Appendix: Other Operation interface
#### cluster
Query full data of clusterId and clusterName (get)
diff --git a/docs/modules/website/quick_start.md
b/docs/modules/website/quick_start.md
index 8eeaeda..a855ab2 100644
--- a/docs/modules/website/quick_start.md
+++ b/docs/modules/website/quick_start.md
@@ -2,20 +2,20 @@
title: Build && Deployment
---
-## About WebSite
+## 1 About WebSite
This is a website console for us to use the [Apache InLong
incubator](https://github.com/apache/incubator-inlong).
-## Build
+## 2 Build
```
mvn package -DskipTests -Pdocker -pl inlong-website
```
-## Run
+## 3 Run
```
docker run -d --name website -e MANAGER_API_ADDRESS=127.0.0.1:8083 -p 80:80
inlong/website
```
-## Guide For Developer
+## 4 Guide For Developer
You should check that `nodejs >= 12.0` is installed.
In the project, you can run some built-in commands:
@@ -33,14 +33,14 @@ The start of the web server depends on the back-end server
`manger api` interfac
You should start the backend server first, and then set the variable `target`
in `/inlong-website/src/setupProxy.js` to the address of the api service.
-### Test
+### 4.1 Test
Run `npm test` or `yarn test`
Start the test runner in interactive observation mode.
For more information, see the section on [Running
Tests](https://create-react-app.dev/docs/running-tests/).
-### Build
+### 4.2 Build
First, make sure that the project has run `npm install` or `yarn install` to
install `node_modules`.
diff --git a/docs/user_guide/example.md b/docs/user_guide/example.md
index 140269b..6c57d7b 100644
--- a/docs/user_guide/example.md
+++ b/docs/user_guide/example.md
@@ -5,17 +5,17 @@ sidebar_position: 3
Here we use a simple example to help you experience InLong by Docker.
-## Install Hive
+## 1 Install Hive
Hive is the necessary component. If you don't have Hive in your machine, we
recommand using Docker to install it. Details can be found
[here](https://github.com/big-data-europe/docker-hive).
> Note that if you use Docker, you need to add a port mapping `8020:8020`,
> because it's the port of HDFS DefaultFS, and we need to use it later.
-## Install InLong
+## 2 Install InLong
Before we begin, we need to install InLong. Here we provide two ways:
1. Install InLong with Docker by according to the [instructions
here](https://github.com/apache/incubator-inlong/tree/master/docker/docker-compose).(Recommanded)
2. Install InLong binary according to the [instructions
here](./quick_start.md).
-## Create a data access
+## 3 Create a data access
After deployment, we first enter the "Data Access" interface, click "Create an
Access" in the upper right corner to create a new date access, and fill in the
business information as shown in the figure below.
<img src="/img/create-business.png" align="center" alt="Create Business"/>
@@ -38,12 +38,12 @@ Note that the target table does not need to be created in
advance, as InLong Man
Then we click the "Submit for Approval" button, the connection will be created
successfully and enter the approval state.
-## Approve the data access
+## 4 Approve the data access
Then we enter the "Approval Management" interface and click "My Approval" to
approve the data access that we just applied for.
At this point, the data access has been created successfully. We can see that
the corresponding table has been created in Hive, and we can see that the
corresponding topic has been created successfully in the management GUI of
TubeMQ.
-## Configure the agent
+## 5 Configure the agent
Here we use `docker exec` to enter the container of the agent and configure it.
```
$ docker exec -it agent sh
diff --git a/docs/user_guide/quick_start.md b/docs/user_guide/quick_start.md
index 483aa46..05cd332 100644
--- a/docs/user_guide/quick_start.md
+++ b/docs/user_guide/quick_start.md
@@ -5,7 +5,7 @@ sidebar_position: 1
This section contains a quick start guide to help you get started with Apache
InLong.
-## Overall architecture
+## 1 Overall architecture
<img src="/img/inlong-structure-en.png" align="center" alt="Apache InLong"/>
[Apache InLong](https://inlong.apache.org)(incubating) overall architecture is
as above. This component is a one-stop data streaming platform that provides
automated, secure, distributed, and efficient data publishing and subscription
capabilities to help You can easily build stream-based data applications.
@@ -15,7 +15,7 @@ InLong (应龙) is a divine beast in Chinese mythology who guides
river into the
InLong was originally built in Tencent and has served online business for more
than 8 years. It supports massive data (over 40 trillion pieces of data per
day) report services under big data scenarios. The entire platform integrates 5
modules including data collection, aggregation, caching, sorting and management
modules. Through this system, the business only needs to provide data sources,
data service quality, data landing clusters and data landing formats, that is,
data can be continu [...]
-## Compile
+## 2 Compile
- Java [JDK 8](https://adoptopenjdk.net/?variant=openjdk8)
- Maven 3.6.1+
@@ -39,38 +39,38 @@ inlong-tubemq-server
inlong-website
```
-## Environment Requirements
+## 3 Environment Requirements
- ZooKeeper 3.5+
- Hadoop 2.10.x 和 Hive 2.3.x
- MySQL 5.7+
- Flink 1.9.x
-## deploy InLong TubeMQ Server
+## 4 deploy InLong TubeMQ Server
[deploy InLong TubeMQ Server](modules/tubemq/quick_start.md)
-## deploy InLong TubeMQ Manager
+## 5 deploy InLong TubeMQ Manager
[deploy InLong TubeMQ Manager](modules/tubemq/tubemq-manager/quick_start.md)
-## deploy InLong Manager
+## 6 deploy InLong Manager
[deploy InLong Manager](modules/manager/quick_start.md)
-## deploy InLong WebSite
+## 7 deploy InLong WebSite
[deploy InLong WebSite](modules/website/quick_start.md)
-## deploy InLong Sort
+## 8 deploy InLong Sort
[deploy InLong Sort](modules/sort/quick_start.md)
-## deploy InLong DataProxy
+## 9 deploy InLong DataProxy
[deploy InLong DataProxy](modules/dataproxy/quick_start.md)
-## deploy InLong DataProxy-SDK
+## 10 deploy InLong DataProxy-SDK
[deploy InLong DataProxy](modules/dataproxy-sdk/quick_start.md)
-## deploy InLong Agent
+## 11 deploy InLong Agent
[deploy InLong Agent](modules/agent/quick_start.md)
-## Business configuration
+## 12 Business configuration
[How to configure a new business](docs/user_guide/user_manual)
-## Data report verification
+## 13 Data report verification
At this stage, you can collect data through the file agent and verify whether
the received data is consistent with the sent data in the specified Hive table.
diff --git a/docs/user_guide/user_manual.md b/docs/user_guide/user_manual.md
index 83bdf33..8f516a4 100644
--- a/docs/user_guide/user_manual.md
+++ b/docs/user_guide/user_manual.md
@@ -3,13 +3,13 @@ title: User Manual
sidebar_position: 2
---
-# 1. User login
+## 1 User login
Requires the user to enter the account name and password of the system.

-# 2. Data access
+## 2 Data access
The data access module displays a list of all tasks connected to the system
within the current user authority, and can
view, edit, update and delete the details of these tasks.
@@ -18,9 +18,9 @@ Click [Data Access], there are two steps to fill in data
access information: bus

-## 2.1 Business Information
+### 2.1 Business Information
-### 2.1.1 Business Information
+#### 2.1.1 Business Information
You are required to fill in basic business information for access tasks.
@@ -33,7 +33,7 @@ You are required to fill in basic business information for
access tasks.
information, add and modify all access configuration items
- Business introduction: Cut SMS to introduce the business background and
application of this access task:
-### 2.1.2 Access requirements
+#### 2.1.2 Access requirements
Access requirements require users to choose message middleware: high
throughput (TUBE):
@@ -41,14 +41,14 @@ Access requirements require users to choose message
middleware: high throughput
High-throughput-Tube: high-throughput message transmission component, suitable
for log message transmission.
-### 2.1.3 Access scale
+#### 2.1.3 Access scale
The scale of access requires users to judge the scale of access data in
advance, to allocate computing and storage
resources later.

-## 2.2 Data stream
+### 2.2 Data stream
Click [Next] to enter the data flow information filling step. There are four
modules for data flow information filling:
basic information, data source, data information, and data stream.
@@ -57,7 +57,7 @@ In the data flow process, you can click [New Data Stream] to
create a new data s

-### 2.2.1 Basic information
+#### 2.2.1 Basic information
You are required to fill in the basic information of the data stream in the
access task:
@@ -70,7 +70,7 @@ You are required to fill in the basic information of the data
stream in the acce
configuration items
- Introduction to data flow: simple text introduction to data flow
-### 2.2.2 Data source
+#### 2.2.2 Data source
You are required to select the source of the data stream.
@@ -83,7 +83,7 @@ be supplemented in the advanced options.

-### 2.2.3 Data Information
+#### 2.2.3 Data Information
You are required to fill in the data-related information in the data stream.
@@ -95,7 +95,7 @@ You are required to fill in the data-related information in
the data stream.
- Source field separator: the format of data sent to MQ
- Source data field: attributes with different meanings divided by a certain
format in MQ
-### 2.2.4 Data storage
+#### 2.2.4 Data storage
You are required to select the final flow direction of this task, this part is
not currently supports both hive storage
and autonomous push.
@@ -117,9 +117,9 @@ Add HIVE storage:
- Field related information: source field name, source field type, HIVE field
name, HIVE field type, field description,
and support deletion and addition-
-# 3. Access details
+## 3 Access details
-## 3.1 Execution log
+### 3.1 Execution log
When the status of the data access task is "approved successfully" or
"configuration failed", the "execution log"
function can be used to allow users to view the progress and details of the
task.
@@ -133,34 +133,34 @@ Click [Execution Log] to display the details of the task
execution log in a pop-
The execution log will display the task type, execution result, execution log
content, end time, and the end time of the
execution of the access process. If the execution fails, you can "restart" the
task and execute it again.
-## 3.2 Task details
+### 3.2 Task details
The business person in charge/following person can view the access details of
the task, and can modify and update part
of the information under the status of [Waiting Applying], [Configuration
Successful], and [Configuration Failed].
There are three modules in the access task details: business information, data
stream and data storage.
-### 3.2.1 Business Information
+#### 3.2.1 Business Information
Display the basic business information in the access task, click [Edit] to
modify part of the content

-### 3.2.2 Data stream
+#### 3.2.2 Data stream
Display the basic information of the data flow under the access task, click
[New Data Flow] to create a new data flow
information

-### 3.2.3 Data Storage
+#### 3.2.3 Data Storage
Display the basic information of the data flow in the access task, select
different flow types through the drop-down
box, and click [New Flow Configuration] to create a new data storage.

-# 4. Data consumption
+## 4 Data consumption
Data consumption currently does not support direct consumption access to data,
and data can be consumed normally after
the approval process.
@@ -170,7 +170,7 @@ consumption.

-## 4.1 Consumer Information
+### 4.1 Consumer Information
Applicants need to gradually fill in the basic consumer business information
related to data consumption applications in
the information filling module
@@ -190,40 +190,40 @@ the information filling module
their own consumption scenarios After completing the information, click
[Submit], and the data consumption process
will be formally submitted to the approver before it will take effect.
-# 5. Approval management
+## 5 Approval management
The approval management function module currently includes my application and
my approval, and all tasks of data access
and consumption application approval in the management system.
-## 5.1 My application
+### 5.1 My application
Display the current task list submitted by the applicant for data access and
consumption in the system, click [Details]
to view the current basic information and approval process of the task.

-### 5.1.1 Data access details
+#### 5.1.1 Data access details
Data access task detailed display The current basic information of the
application task includes: applicant-related
information, basic information about application access, and current approval
process nodes.

-### 5.1.2 Data consumption details
+#### 5.1.2 Data consumption details
Data consumption task details display basic information of current application
tasks including: applicant information,
basic consumption information, and current approval process nodes.

-## 5.2 My approval
+### 5.2 My approval
As a data access officer and system member with approval authority, have the
responsibility for data access or
consumption approval.

-### 5.2.1 Data Access Approval
+#### 5.2.1 Data Access Approval
New data access approval: currently it is a first-level approval, which is
approved by the system administrator.
@@ -232,7 +232,7 @@ business information.

-### 5.2.2 New data consumption approval
+#### 5.2.2 New data consumption approval
New data consume approval: currently it is a first-level approval, which is
approved by the person in charge of the
business.
@@ -242,13 +242,13 @@ requirements according to the access information:

-# 6. System Management
+## 6 System Management
Only users with the role of system administrator can use this function. They
can create, modify, and delete users:

-## 6.1 New user
+### 6.1 New user
Users with system administrator rights can create new user accounts
@@ -262,13 +262,13 @@ Users with system administrator rights can create new
user accounts
-Effective duration: the account can be used in the system

-## 6.2 Delete user
+### 6.2 Delete user
The system administrator can delete the account of the created user. After the
deletion, the account will stop using:

-## 6.3 User Edit
+### 6.3 User Edit
The system administrator can modify the created account:
@@ -278,7 +278,7 @@ The system administrator can modify the account type and
effective duration to p

-## 6.4 Change password
+### 6.4 Change password
The user can modify the account password, click [Modify Password], enter the
old password and the new password, after
confirmation, the new password of this account will take effect:
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/architecture.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/architecture.md
index e7b65aa..677898f 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/architecture.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/architecture.md
@@ -1,7 +1,7 @@
---
title: 架构介绍
---
-## 一. InLong-Agent 概览
+## 1 InLong-Agent 概览
InLong-Agent是一个支持多种数据源类型的收集工具,致力于实现包括file、sql、Binlog、metrics等多种异构数据源之间稳定高效的数据采集功能。
### 简要的架构图如下:
@@ -15,7 +15,7 @@ InLong-Agent是一个支持多种数据源类型的收集工具,致力于实
### 当前使用现状
InLong-Agent在腾讯集团内被广泛使用,承担了大部分的数据采集业务,线上数据量达百亿级别。
-## 二. InLong-Agent 架构介绍
+## 2 InLong-Agent 架构介绍
InLong Agent本身作为数据采集框架,采用channel +
plugin架构构建。将数据源读取和写入抽象成为Reader/Writer插件,纳入到整个框架中。
+ Reader:Reader为数据采集模块,负责采集数据源的数据,将数据发送给channel。
@@ -23,7 +23,7 @@ InLong Agent本身作为数据采集框架,采用channel + plugin架构构建
+ Channel:Channel用于连接reader和writer,作为两者的数据传输通道,并起到了数据的写入读取监控作用
-## 三. InLong-Agent 采集分类说明
+## 3 InLong-Agent 采集分类说明
### 3.1 文件采集
文件采集包含如下功能:
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md
index 714c318..a5bff20 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md
@@ -2,14 +2,14 @@
title: 编译部署
---
-## 1、配置
+## 1 配置
```
cd inlong-agent
```
agent 支持本地运行以及线上运行,其中线上运行从inlong manager拉取任务,本地运行可使用http请求提交任务
-### Agent 线上运行相关设置
+### 1.1 Agent 线上运行相关设置
线上运行需要从inlong-manager拉取配置,配置conf/agent.properties如下:
```ini
@@ -19,16 +19,16 @@ agent.manager.vip.http.host=manager web host
agent.manager.vip.http.port=manager web port
```
-## 2、运行
+## 2 运行
解压后如下命令运行
```bash
sh agent.sh start
```
-### 3 实时添加job配置
+## 3 实时添加job配置
-#### 3.1 agent.properties 修改下面两处
+### 3.1 agent.properties 修改下面两处
```ini
# whether enable http service
@@ -37,7 +37,7 @@ agent.http.enable=true
agent.http.port=可用端口
```
-#### 3.2 执行如下命令:
+### 3.2 执行如下命令:
```bash
curl --location --request POST 'http://localhost:8008/config/job' \
@@ -76,7 +76,7 @@ curl --location --request POST
'http://localhost:8008/config/job' \
- proxy.groupId:
写入proxy时使用的groupId,groupId是指manager界面中,数据接入中业务信息的业务ID,此处不是创建的tube topic名称
- proxy.streamId: 写入proxy时使用的streamId,streamId是指manager界面中,数据接入中数据流的数据流ID
-## 4、可支持的路径配置方案
+## 4 可支持的路径配置方案
例如:
/data/inlong-agent/test.log //代表读取inlong-agent文件夹下的的新增文件test.log
@@ -85,7 +85,7 @@ curl --location --request POST
'http://localhost:8008/config/job' \
/data/inlong-agent/^\\d+(\\.\\d+)? //
以一个或多个数字开头,之后可以是.或者一个.或多个数字结尾,?代表可选,可以匹配的实例:"5", "1.5" 和 "2.21"
-## 5、支持从文件名称中获取数据时间
+## 5 支持从文件名称中获取数据时间
Agent支持从文件名称中获取时间当作数据的生产时间,配置说明如下:
/data/inlong-agent/***YYYYMMDDHH***
@@ -141,7 +141,7 @@ curl --location --request POST
'http://localhost:8008/config/job' \
```
-## 6、支持时间偏移量offset读取
+## 6 支持时间偏移量offset读取
在配置按照时间读取之后,如果想要读取当前时间之外的其他时间的数据,可以通过配置时间偏移量完成
配置job属性名称为job.timeOffset,值为数字 + 时间维度,时间维度包括天和小时
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy-sdk/architecture.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy-sdk/architecture.md
index 40a8a59..ea69a87 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy-sdk/architecture.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy-sdk/architecture.md
@@ -1,7 +1,7 @@
---
title: 架构介绍
---
-# 一、说明
+## 1 说明
在业务使用消息接入方式时,业务一般仅需将数据按照DataProxy可识别的格式(如六段协议、数字化协议等)
进行组包发送,就可以将数据接入到inlong。但为了保证数据可靠性、负载均衡、动态更新proxy列表等安全特性
@@ -9,9 +9,9 @@ title: 架构介绍
API的设计初衷就是为了简化用户接入,承担部分可靠性相关的逻辑。用户通过在服务送程序中集成API后,即可将数据发送到DataProxy,而不用关心组包格式、负载均衡等逻辑。
-# 二、功能说明
+## 2 功能说明
-## 2.1 整体功能说明
+### 2.1 整体功能说明
| 功能 | 详细描述 |
| ---- | ---- |
@@ -23,9 +23,9 @@ API的设计初衷就是为了简化用户接入,承担部分可靠性相关
| DataProxy列表持久化(新) | 根据业务id对DataProxy列表持久化,防止程序启动时配置中心发生故障无法发送数据
-## 2.2 数据发送功能说明
+### 2.2 数据发送功能说明
-### 同步批量函数
+#### 同步批量函数
public SendResult sendMessage(List<byte[]> bodyList, String groupId,
String streamId, long dt, long timeout, TimeUnit timeUnit)
@@ -35,7 +35,7 @@ API的设计初衷就是为了简化用户接入,承担部分可靠性相关
-###同步单条函数
+#### 同步单条函数
public SendResult sendMessage(byte[] body, String groupId, String
streamId, long dt, long timeout, TimeUnit timeUnit)
@@ -45,7 +45,7 @@ API的设计初衷就是为了简化用户接入,承担部分可靠性相关
-###异步批量函数
+#### 异步批量函数
public void asyncSendMessage(SendMessageCallback callback, List<byte[]>
bodyList, String groupId, String streamId, long dt, long timeout,TimeUnit
timeUnit)
@@ -54,7 +54,7 @@ API的设计初衷就是为了简化用户接入,承担部分可靠性相关
SendMessageCallback
是处理消息的callback。bodyList为用户需要发送的多条数据的集合,多条数据的总长度建议小于512k。groupId是业务id,streamId是接口id。dt表示该数据的时间戳,精确到毫秒级别。也可直接设置为0,此时api会后台获取当前时间作为其时间戳。timeout和timeUnit是发送数据的超时时间,一般建议设置成20s。
-###异步单条函数
+#### 异步单条函数
public void asyncSendMessage(SendMessageCallback callback, byte[] body,
String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/architecture.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/architecture.md
index 23fe487..272e0a8 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/architecture.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/architecture.md
@@ -1,7 +1,7 @@
---
title: 架构介绍
---
-# 一、说明
+## 1 说明
InLong-dataProxy属于inlong
proxy层,用于数据的汇集接收以及转发。通过格式转换,将数据转为cache层可以缓存处理的TDMsg1格式
InLong-dataProxy充当了InLong采集端到InLong缓冲端的桥梁,dataproxy从manager模块拉取业务id与对应topic名称的关系,内部管理多个topic的生产者
@@ -9,7 +9,7 @@ title: 架构介绍
InLong-dataProxy整体架构基于Apache
Flume。inlong-dataproxy在该项目的基础上,扩展了source层和sink层,并对容灾转发做了优化处理,提升了系统的稳定性。
-# 二、架构
+## 2 架构

@@ -18,7 +18,7 @@ title: 架构介绍
3.channel层的数据会通过sink层做转发,这里主要是将数据转为TDMsg1的格式,并推送到cache层(这里用的比较多的是tube)
-# 三、DataProxy功能配置说明
+## 3 DataProxy功能配置说明
DataProxy支持配置化的source-channel-sink,配置方式与flume的配置文件结构相同:
@@ -157,7 +157,7 @@ agent1.sinks.meta-sink-more1.max-survived-size = 3000000
缓存最大个数
```
-# 4、监控指标配置说明
+## 4 监控指标配置说明
DataProxy提供了JMX方式的监控指标Listener能力,用户可以实现MetricListener接口,注册后可以定期接收监控指标,用户选择将指标上报自定义的监控系统。Source和Sink模块可以通过将指标数据统计到org.apache.inlong.commons.config.metrics.MetricItemSet的子类中,并注册到MBeanServer。用户自定义的MetricListener通过JMX方式收集指标数据并上报到外部监控系统
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/quick_start.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/quick_start.md
index 18e7df1..72eaacd 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/quick_start.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/quick_start.md
@@ -1,11 +1,11 @@
---
title: 编译部署
---
-## 部署 DataProxy
+## 1 部署 DataProxy
所有的安装文件都在 `inlong-dataproxy` 目录下。
-### 配置tube地址和端口号
+### 1.1 配置tube地址和端口号
`tubemq_master_list`是TubeMQ master rpc地址,多个逗号分隔。
```
@@ -14,13 +14,13 @@ $ sed -i 's/TUBE_LIST/tubemq_master_list/g' conf/flume.conf
注意conf/flume.conf中FLUME_HOME为proxy的中间数据文件存放地址
-### 环境准备
+### 1.2 环境准备
```
sh prepare_env.sh
```
-### 配置manager地址
+### 1.3 配置manager地址
配置文件:`conf/common.properties`:
```
@@ -28,19 +28,19 @@ sh prepare_env.sh
manager_hosts=ip:port
```
-## 启动
+## 2 启动
```
sh bin/start.sh
```
-## 检查启动状态
+## 3 检查启动状态
```
telnet 127.0.0.1 46801
```
-## 将 DataProxy 配置添加到 InLong-Manager
+## 4 将 DataProxy 配置添加到 InLong-Manager
安装完 DataProxy 后,需要将 DataProxy 所在主机的 IP 插入到 InLong-Manager 的后台数据库中。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/architecture.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/architecture.md
index 9a01a35..7650366 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/architecture.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/architecture.md
@@ -2,7 +2,7 @@
title: 架构介绍
---
-## Apache InLong Manager介绍
+## 1 Apache InLong Manager介绍
+ 目标定位:Apache inlong 定位为一站式数据接入解决方案,提供完整覆盖大数据接入场景从数据采集、传输、分拣、落地的技术能力。
@@ -10,12 +10,12 @@ title: 架构介绍
+ Apache InLong
Manager作为整个数据接入平台面向用户的统一管理入口,用户登录后会根据对应角色提供不同的功能权限以及数据权限。页面提供平台各基础集群(如mq、分拣)维护入口,可随时查看维护基本信息、容量规划调整。同时业务用户可完成数据接入任务的创建、修改维护、指标查看、接入对账等功能。其对应的后台服务在用户创建并启动任务的同时会与底层各模块进行数据交互,将各模块需要执行的任务通过合理的方式下发。起到协调串联后台业务执行流程的作用。
-## Architecture
+## 2 Architecture

-## 模块分工
+## 3 模块分工
| 模块 | 职责 |
| :-----| :---- |
@@ -25,9 +25,9 @@ title: 架构介绍
| manager-web | 前端交互对应接口 |
| manager-workflow-engine | 工作流引擎|
-## 系统使用流程
+## 4 系统使用流程

-## 数据模型
+## 5 数据模型

diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/quick_start.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/quick_start.md
index 2ca2d38..2de98be 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/quick_start.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/manager/quick_start.md
@@ -2,7 +2,7 @@
title: 编译部署
---
-# 1. 环境准备
+## 1 环境准备
- 安装并启动 MySQL 5.7+,把 inlong-manager 模块中的 `doc/sql/apache_inlong_manager.sql`
文件拷贝到 MySQL 数据库所在的服务器
(比如拷贝到 `/data/` 目录下),通过下述命令加载此文件,完成表结构及基础数据的初始化:
@@ -22,15 +22,15 @@ title: 编译部署
- 参照 [编译部署TubeMQ
Manager](https://inlong.apache.org/zh-cn/docs/modules/tubemq/tubemq-manager/quick_start.html),安装并启动
TubeManager。
-# 2. 部署、启动 manager-web
+## 2 部署、启动 manager-web
**manager-web 是与前端页面交互的后台服务。**
-## 2.1 准备安装文件
+### 2.1 准备安装文件
安装文件在 `inlong-manager-web` 目录下。
-## 2.2 修改配置
+### 2.2 修改配置
前往 `inlong-manager-web` 目录,修改 `conf/application.properties` 文件:
@@ -70,7 +70,7 @@ spring.profiles.active=dev
sort.appName=inlong_app
```
-## 2.3 启动服务
+### 2.3 启动服务
进入解压后的目录,执行 `sh bin/startup.sh` 启动服务,查看日志 `tailf
log/manager-web.log`,若出现类似下面的日志,说明服务启动成功:
@@ -78,7 +78,7 @@ spring.profiles.active=dev
Started InLongWebApplication in 6.795 seconds (JVM running for 7.565)
```
-# 3. 服务访问验证
+## 3 服务访问验证
在浏览器中访问如下地址,验证 manager-web 服务:
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/introduction.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/introduction.md
index 3b4404c..7d24753 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/introduction.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/introduction.md
@@ -7,36 +7,36 @@ inlong-sort是一个基于flink的ETL系统,支持多种数据源,支持简
inlong-sort依赖inlong-manager进行系统元数据的管理,元数据依赖zk进行存储及同步。
# 特性
-## 多租户系统
+## 1 多租户系统
inlong-sort支持多租户,一个inlong-sort的作业中可以包含多个同构的数据源,以及多个同构的存储系统。
并且针对不同的数据源,可以定义不同的数据格式以及字段抽取方式。
多租户系统依赖inlong-manager的元数据管理,用户只需要在inlong-manager的前端页面进行相应的配置,即可实现。
举例:以tubemq为source,hive为存储为例,同一个inlong-sort的作业可以订阅多个topic的tubemq数据,并且每个topic的数据可以写入不同的hive集群。
-## 支持热更新元数据
+## 2 支持热更新元数据
inlong-sort支持热更新元数据,比如更新数据源的信息,数据schema,或者写入存储系统的信息。
需要注意的是,当前修改数据源信息时,可能会造成数据丢失,因为修改数据源信息后,系统会认为这是一个全新的subscribe,会默认从消息队列的最新位置开始消费。
修改数据schema,抽取字段规则以及写入存储的信息,不会造成任何数据丢失,保证exactly-once
-# 支持的数据源
+## 3 支持的数据源
- inlong-tubemq
- pulsar
-# 支持的存储系统
+## 4 支持的存储系统
- hive(当前只支持parquet文件格式)
- clickhouse
-# 一些局限
+## 5 一些局限
当前inlong-sort在ETL的transform阶段,只支持简单的字段抽取功能,一些复杂功能暂不支持。
-# 未来规划
-## 支持更多种类的数据源
+## 6 未来规划
+### 6.1 支持更多种类的数据源
kafka等
-## 支持更多种类的存储
+### 6.2 支持更多种类的存储
Hbase,Elastic Search等
-## 支持更多种写入hive的文件格式
+### 6.3 支持更多种写入hive的文件格式
sequece file,orc
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/protocol_introduction.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/protocol_introduction.md
index c5504d5..f5c48ef 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/protocol_introduction.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/protocol_introduction.md
@@ -7,7 +7,7 @@ title: Zookeeper配置介绍
inlong-sort与inlong-manager之间通过zk进行元数据的交互。
-# Zookeeper结构
+## 1 Zookeeper结构

@@ -21,5 +21,5 @@ dataflow代表一个具体的流向,每个流向有一个全局唯一的id来
元数据管理逻辑可以查看类`org.apache.inlong.sort.meta.MetaManager`
-# 协议设计
+## 2 协议设计
具体的协议可以查看类`org.apache.inlong.sort.protocol.DataFlowInfo`
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/quick_start.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/quick_start.md
index 334dd52..fc03223 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/quick_start.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort/quick_start.md
@@ -2,7 +2,7 @@
title: 编译部署
---
-## 配置flink运行环境
+## 1 配置flink运行环境
当前inlong-sort是基于flink的一个应用,因此运行inlong-sort应用前,需要准备好flink环境。
[如何配置flink环境](https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/deployment/cluster_setup.html
"how to set up flink environment")
@@ -11,10 +11,10 @@ title: 编译部署
flink环境配置完成后,可以通过浏览器访问flink的web ui,对应的地址是`/{flink部署路径}/conf/masters`文件中的地址
-## 准备安装文件
+## 2 准备安装文件
安装文件在`inlong-sort`目录。
-## 启动inlong-sort应用
+## 3 启动inlong-sort应用
有了上述编译阶段产出的jar包后,就可以启动inlong-sort的应用了。
[如何提交flink作业](https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/deployment/yarn_setup.html#submit-job-to-flink
"如何提交flink作业")
@@ -29,7 +29,7 @@ flink环境配置完成后,可以通过浏览器访问flink的web ui,对应
- `inlong-sort-core-1.0-SNAPSHOT.jar` 为编译阶段产出的jar包
-## 必要的配置
+## 4 必要的配置
- `--cluster-id ` 用来唯一标识一个inlong-sort作业
- `--zookeeper.quorum` zk quorum
- `--zookeeper.path.root` zk根目录
@@ -40,7 +40,7 @@ flink环境配置完成后,可以通过浏览器访问flink的web ui,对应
`--cluster-id my_application --zookeeper.quorum 192.127.0.1:2181
--zookeeper.path.root /zk_root --source.type tubemq --sink.type hive`
-## 所有支持的配置
+## 5 所有支持的配置
| 配置名 | 是否必须 | 默认值 |描述 |
| ------------ | ------------ | ------------ | ------------ |
|cluster-id | Y | NA | 用来唯一标识一个inlong-sort作业 |
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/clients_java.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/clients_java.md
index 66fb611..a767e4e 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/clients_java.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/clients_java.md
@@ -150,11 +150,11 @@ public class DefaultMessageListener implements
MessageListener {
```
-### 3 创建Producer:
+## 3 创建Producer:
现网环境中业务的数据都是通过代理层来做接收汇聚,包装了比较多的异常处理,大部分的业务都没有也不会接触到TubeSDK的Producer类,考虑到业务自己搭建集群使用TubeMQ进行使用的场景,这里提供对应的使用demo,见包org.apache.inlong.tubemq.example.MessageProducerExample类文件供参考,**需要注意**的是,业务除非使用数据平台的TubeMQ集群做MQ服务,否则仍要按照现网的接入流程使用代理层来进行数据生产:
-#### 3.1 初始化MessageProducerExample类:
+### 3.1 初始化MessageProducerExample类:
和Consumer的初始化类似,也是构造了一个封装类,定义了一个会话工厂,以及一个Producer类,生产端的会话工厂初始化通过TubeClientConfig类进行,如之前所介绍的,ConsumerConfig类是TubeClientConfig类的子类,虽然传入参数不同,但会话工厂是通过TubeClientConfig类完成的初始化处理:
@@ -182,7 +182,7 @@ public final class MessageProducerExample {
```
-#### 3.2 发布Topic:
+### 3.2 发布Topic:
```java
public void publishTopics(List<String> topicList) throws TubeClientException {
@@ -191,7 +191,7 @@ public void publishTopics(List<String> topicList) throws
TubeClientException {
```
-#### 3.3 进行数据生产:
+### 3.3 进行数据生产:
如下所示,则为具体的数据构造和发送逻辑,构造一个Message对象后调用sendMessage()函数发送即可,有同步接口和异步接口选择,依照业务要求选择不同接口;需要注意的是该业务根据不同消息调用message.putSystemHeader()函数设置消息的过滤属性和发送时间,便于系统进行消息过滤消费,以及指标统计用。完成这些,一条消息即被发送出去,如果返回结果为成功,则消息被成功的接纳并且进行消息处理,如果返回失败,则业务根据具体错误码及错误提示进行判断处理,相关错误详情见《TubeMQ错误信息介绍.xlsx》:
@@ -218,7 +218,7 @@ public void sendMessageAsync(int id, long currtime, String
topic, byte[] body, M
```
-#### 3.5 Producer不同类MAMessageProducerExample关注点:
+### 3.4 Producer不同类MAMessageProducerExample关注点:
该类初始化与MessageProducerExample类不同,采用的是TubeMultiSessionFactory多会话工厂类进行的连接初始化,该demo提供了如何使用多会话工厂类的特性,可以用于通过多个物理连接提升系统吞吐量的场景(TubeMQ通过连接复用模式来减少物理连接资源的使用),恰当使用可以提升系统的生产性能。在Consumer侧也可以通过多会话工厂进行初始化,但考虑到消费是长时间过程处理,对连接资源的占用比较小,消费场景不推荐使用。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/quick_start.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/quick_start.md
index e806f92..f4925b2 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/quick_start.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/quick_start.md
@@ -1,7 +1,7 @@
---
title: 快速开始
---
-## 部署运行
+## 1 部署运行
### 1.1 配置示例
TubeMQ 集群包含有两个组件: **Master** 和 **Broker**. Master 和 Broker
可以部署在相同或者不同的节点上,依照业务对机器的规划进行处理。我们通过如下3台机器搭建有2台Master的生产、消费的集群进行配置示例:
@@ -126,8 +126,8 @@ Broker启动前,首先要在Master上配置Broker元数据,增加Broker相
刷新页面可以看到 Broker 已经注册,当 `当前运行子状态` 为 `idle` 时, 可以增加topic:

-## 3 快速使用
-### 3.1 新增 Topic
+## 2 快速使用
+### 2.1 新增 Topic
可以通过 web GUI 添加 Topic, 在 `Topic列表`页面添加,需要填写相关信息,比如增加`demo` topic:

diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq-manager/quick_start.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq-manager/quick_start.md
index 41a0e4e..b4d551f 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq-manager/quick_start.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq-manager/quick_start.md
@@ -1,7 +1,7 @@
-## 部署TubeMQ Manager
+## 1 部署TubeMQ Manager
安装文件在inlong-tubemq-manager目录.
-### 配置
+### 1.1 配置
- 在mysql中创建`tubemanager`数据和相应用户.
- 在conf/application.properties中添加mysql信息:
@@ -12,13 +12,13 @@ spring.datasource.username=mysql_username
spring.datasource.password=mysql_password
```
-### 启动服务
+### 1.2 启动服务
``` bash
$ bin/start-manager.sh
```
-### 初始化TubeMQ集群
+### 1.3 初始化TubeMQ集群
vim bin/init-tube-cluster.sh
@@ -38,7 +38,7 @@ sh bin/init-tube-cluster.sh
```
如上操作会创建一个clusterId为1的tube集群,注意该操作只进行一次,之后重启服务无需新建集群
-### 附录:其它操作接口
+### 1.4 附录:其它操作接口
#### cluster
查询clusterId以及clusterName全量数据 (get)
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq_perf_test_vs_Kafka_cn.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq_perf_test_vs_Kafka_cn.md
index adebec8..4627320 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq_perf_test_vs_Kafka_cn.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/tubemq_perf_test_vs_Kafka_cn.md
@@ -176,22 +176,22 @@ TubeMQ是腾讯大数据自研的分布式消息中间件。其系统架构思

## 6 附录
-## 6.1 附录1 不同机型下资源占用情况图:
-### 6.1.1 【BX1机型测试】
+### 6.1 附录1 不同机型下资源占用情况图:
+#### 6.1.1 【BX1机型测试】




-### 6.1.2 【CG1机型测试】
+#### 6.1.2 【CG1机型测试】




-## 6.2 附录2 多Topic测试时的资源占用情况图:
+### 6.2 附录2 多Topic测试时的资源占用情况图:
-### 6.2.1 【100个topic】
+#### 6.2.1 【100个topic】



@@ -202,7 +202,7 @@ TubeMQ是腾讯大数据自研的分布式消息中间件。其系统架构思


-### 6.2.2 【200个topic】
+#### 6.2.2 【200个topic】



@@ -213,7 +213,7 @@ TubeMQ是腾讯大数据自研的分布式消息中间件。其系统架构思


-### 6.2.3 【500个topic】
+#### 6.2.3 【500个topic】



@@ -224,7 +224,7 @@ TubeMQ是腾讯大数据自研的分布式消息中间件。其系统架构思


-### 6.2.4 【1000个topic】
+#### 6.2.4 【1000个topic】



diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/website/quick_start.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/website/quick_start.md
index 9d8441f..157bb07 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/website/quick_start.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/website/quick_start.md
@@ -2,20 +2,20 @@
title: 编译部署
---
-## 关于 WebSite
+## 1 关于 WebSite
WebSite[Apache InLong
incubator](https://github.com/apache/incubator-inlong)的管控端。
-## 编译
+## 2 编译
```
mvn package -DskipTests -Pdocker -pl inlong-website
```
-## 运行
+## 3 运行
```
docker run -d --name website -e MANAGER_API_ADDRESS=127.0.0.1:8083 -p 80:80
inlong/website
```
-## 开发指引
+## 4 开发指引
确认 `nodejs >= 12.0` 已经安装。
@@ -34,14 +34,14 @@ web服务器的启动依赖于后端服务 `manger api` 接口。
您应该先启动后端服务器,然后将 `/inlong-website/src/setupProxy.js` 中的变量`target` 设置为api服务的地址。
-### 测试
+### 4.1 测试
运行 `npm test` 或 `yarn test`
在交互式观察模式下启动测试运行器。
有关更多信息,请参阅有关 [运行测试](https://create-react-app.dev/docs/running-tests/) 的部分。
-### 构建
+### 4.2 构建
首先保证项目已运行过 `npm install` 或 `yarn install` 安装了 `node_modules`。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/example.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/example.md
index d211033..ae65018 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/example.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/example.md
@@ -6,18 +6,18 @@ sidebar_position: 3
本节用一个简单的示例,帮助您使用 Docker 快速体验 InLong 的完整流程。
-## 安装 Hive
+## 1 安装 Hive
Hive 是运行的必备组件。如果您的机器上没有 Hive,这里推荐使用 Docker 进行快速安装,详情可见
[这里](https://github.com/big-data-europe/docker-hive)。
> 注意,如果使用以上 Docker 镜像的话,我们需要在 namenode 中添加一个端口映射 `8020:8020`,因为它是 HDFS
> DefaultFS 的端口,后面在配置 Hive 时需要用到。
-## 安装 InLong
+## 2 安装 InLong
在开始之前,我们需要安装 InLong 的全部组件,这里提供两种方式:
1. 按照
[这里的说明](https://github.com/apache/incubator-inlong/tree/master/docker/docker-compose),使用
Docker 进行快速部署。(推荐)
2. 按照 [这里的说明](./quick_start.md),使用二进制包依次安装各组件。
-## 新建接入
+## 3 新建接入
部署完毕后,首先我们进入 “数据接入” 界面,点击右上角的 “新建接入”,新建一条接入,按下图所示填入业务信息
<img src="../../img/create-business.png" align="center" alt="Create Business"/>
@@ -40,12 +40,12 @@ Hive 是运行的必备组件。如果您的机器上没有 Hive,这里推荐
然后点击“提交审批”按钮,该接入就会创建成功,进入审批状态。
-## 审批接入
+## 4 审批接入
进入“审批管理”界面,点击“我的审批”,将刚刚申请的接入通过。
到此接入就已经创建完毕了,我们可以在 Hive 中看到相应的表已经被创建,并且在 TubeMQ 的管理界面中可以看到相应的 topic 已经创建成功。
-## 配置 agent
+## 5 配置 agent
然后我们使用 docker 进入 agent 容器内,创建相应的 agent 配置。
```
$ docker exec -it agent sh
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/quick_start.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/quick_start.md
index 06db88f..934515b 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/quick_start.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/quick_start.md
@@ -5,7 +5,7 @@ sidebar_position: 1
本节包含快速入门指南,可帮助您开始使用 Apache InLong。
-## 整体架构
+## 1 整体架构
<img src="/img/inlong-structure-zh.png" align="center" alt="Apache InLong"/>
[Apache InLong](https://inlong.apache.org)(incubating)
整体架构如上,该组件是一站式数据流媒体平台,提供自动化、安全、分布式、高效的数据发布和订阅能力,帮助您轻松构建基于流的数据应用程序。
@@ -14,7 +14,7 @@ InLong(应龙)是中国神话故事里的神兽,可以引流入海,借喻InL
InLong(应龙)
最初建于腾讯,服务线上业务8年多,支持大数据场景下的海量数据(每天40万亿条数据规模以上)报表服务。整个平台集成了数据采集、汇聚、缓存、分拣和管理模块等共5个模块,通过这个系统,业务只需要提供数据源、数据服务质量、数据落地集群和数据落地格式,即数据可以源源不断地将数据从源集群推送到目标集群,极大满足了业务大数据场景下的数据上报服务需求。
-## 编译
+## 2 编译
- Java [JDK 8](https://adoptopenjdk.net/?variant=openjdk8)
- Maven 3.6.1+
@@ -38,39 +38,39 @@ inlong-tubemq-server
inlong-website
```
-## 环境要求
+## 3 环境要求
- ZooKeeper 3.5+
- Hadoop 2.10.x 和 Hive 2.3.x
- MySQL 5.7+
- Flink 1.9.x
-## 部署InLong TubeMQ Server
+## 4 部署InLong TubeMQ Server
[部署InLong TubeMQ Server](modules/tubemq/quick_start.md)
-## 部署InLong TubeMQ Manager
+## 5 部署InLong TubeMQ Manager
[部署InLong TubeMQ Manager](modules/tubemq/tubemq-manager/quick_start.md)
-## 部署InLong Manager
+## 6 部署InLong Manager
[部署InLong Manager](modules/manager/quick_start.md)
-## 部署InLong WebSite
+## 7 部署InLong WebSite
[部署InLong WebSite](modules/website/quick_start.md)
-## 部署InLong Sort
+## 8 部署InLong Sort
[部署InLong Sort](modules/sort/quick_start.md)
-## 部署InLong DataProxy
+## 9 部署InLong DataProxy
[部署InLong DataProxy](modules/dataproxy/quick_start.md)
-## 部署InLong DataProxy-SDK
+## 10 部署InLong DataProxy-SDK
[部署InLong DataProxy](modules/dataproxy-sdk/quick_start.md)
-## 部署InLong Agent
+## 11 部署InLong Agent
[部署InLong Agent](modules/agent/quick_start.md)
-## 业务配置
+## 12 业务配置
[配置新业务](docs/user_guide/user_manual)
-## 数据上报验证
+## 13 数据上报验证
到这里,您就可以通过文件Agent采集数据并在指定的Hive表中验证接收到的数据是否与发送的数据一致。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/user_manual.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/user_manual.md
index 4613300..d6354ad 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/user_manual.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/user_guide/user_manual.md
@@ -3,13 +3,13 @@ title: 用户手册
sidebar_position: 2
---
-# 1. 用户登录
+## 1 用户登录
需系统使用用户输入账号名称和密码。

-# 2. 数据接入
+## 2 数据接入
数据接入模块展示目前用户权限内接入系统所有任务列表,可以对这些任务详情查看、编辑更新和删除操作。
@@ -17,9 +17,9 @@ sidebar_position: 2

-## 2.1 业务信息
+### 2.1 业务信息
-### 2.1.1 业务信息
+#### 2.1.1 业务信息
需要用户对接入任务填写基础业务信息。
@@ -30,7 +30,7 @@ sidebar_position: 2
- 业务责任人:至少2人,业务责任人可查看、修改业务信息,新增和修改所有接入配置项
- 业务介绍:剪短信对此次接入任务进行业务背景和应用介绍:
-### 2.1.2 接入要求
+#### 2.1.2 接入要求
接入要求需要用户进行选择消息中间件:高吞吐(TUBE):
@@ -38,13 +38,13 @@ sidebar_position: 2
高吞吐—Tube :高吞吐消息传输组件,适用于日志类的消息传递。
-### 2.1.3 接入规模
+#### 2.1.3 接入规模
接入规模需要用户预先针对接入数据进行规模判断,以便后续分配计算和存储资源。

-## 2.2 数据流
+### 2.2 数据流
点击【下一步】进入到数据流信息填写步骤,数据流信息填写有四个模块:基础信息、数据来源、数据信息、数据流向。
@@ -52,7 +52,7 @@ sidebar_position: 2

-### 2.2.1 基础信息
+#### 2.2.1 基础信息
需用户对该接入任务中数据流的基础信息进行填写:
@@ -63,7 +63,7 @@ sidebar_position: 2
- 数据流责任人:数据流责任人可查看、修改数据流信息,新增和修改所有接入配置项
- 数据流介绍:数据流简单文本介绍
-### 2.2.2 数据来源
+#### 2.2.2 数据来源
需用户选择该数据流的消息来源,目前支持文件、自主推送三种方式,并且可以在高级选项中补充该数据来源详细信息:
@@ -72,7 +72,7 @@ sidebar_position: 2

-### 2.2.3 数据信息
+#### 2.2.3 数据信息
需用户填写该数据流中数据相关信息:
@@ -83,7 +83,7 @@ sidebar_position: 2
- 源字段分隔符:数据发送到 MQ 里的格式
- 源数据字段:数据在 MQ 里按某种格式划分的不同含义的属性
-### 2.2.4 数据流向
+#### 2.2.4 数据流向
需用户对此任务的流向终流向进行选择,此部分为非必填项,目前支持Hive和自主推送两种:
@@ -103,9 +103,9 @@ HIVE流向:
- JDBC url:hiveserver 的jdbcurl
- 字段相关信息: 源字段名、源字段类型、HIVE字段名、HIVE字段类型、字段描述,并支持删除和新增字段
-# 3. 接入详情
+## 3 接入详情
-## 3.1 执行日志
+### 3.1 执行日志
当数据接入任务状态为”批准成功“和”配置失败“状态,可通过”执行日志“功能,以便用户查看任务执行进程进程和详情:
@@ -117,34 +117,34 @@ HIVE流向:
执行日志中将展示该接入流程执行中任务类型、执行结果、执行日志内容、结束时间、如果执行失败可以”重启“该任务再次执行。
-## 3.2 任务详情
+### 3.2 任务详情
业务负责人/关注人可以查看该任务接入详情,并在【待提交】、【配置成功】、【配置失败】状态下可对部分信息进行修改更新接入任务详情中具有业务信息、数据流、流向三个模块。
-### 3.2.1 业务信息
+#### 3.2.1 业务信息
展示接入任务中基础业务信息,点击【编辑】可对部分内容进行修改更改:

-### 3.2.2 数据流
+#### 3.2.2 数据流
展示该接入任务下数据流基础信息,点击【新建数据流】可新建一条数据流信息:

-### 3.2.3 流向
+#### 3.2.3 流向
展示该接入任务中数据流向基础信息,通过通过下拉框选择不同流向类型,点击【新建流向配置】可新建一条数据流向:

-# 4. 数据消费
+## 4 数据消费
数据消费目前不支持直接消费接入数据,需走数据审批流程后方可正常消费数据; 点击【新建消费】,进入数据消费流程,需要对消费信息相关信息进行填写:

-## 4.1 消费信息
+### 4.1 消费信息
申请人需在该信息填写模块中逐步填写数据消费申请相关基础消费业务信息:
@@ -160,35 +160,35 @@ HIVE流向:

-# 5. 审批管理
+## 5 审批管理
审批管理功能模块目前包含了我的申请和我的审批,管理系统中数据接入和数据消费申请审批全部任务。
-## 5.1 我的申请
+### 5.1 我的申请
展示目前申请人在系统中数据接入、消费提交的任务列表,点击【详情】可以查看目前该任务基础信和审批进程:

-### 5.1.1 数据接入详情
+#### 5.1.1 数据接入详情
数据接入任务详细展示目前该申请任务基础信息包括:申请人相关信息、申请接入基础信息,以及目前审批进程节点:

-### 5.1.2 数据消费详情
+#### 5.1.2 数据消费详情
数据消费任务详情展示目前申请任务基础信息包括:申请人信息、基础消费信息,以及目前审批进程节点:

-## 5.2 我的审批
+### 5.2 我的审批
作为具有审批权限的数据接入员和系统成员,具备对数据接入或者消费审批职责:

-### 5.2.1 数据接入审批
+#### 5.2.1 数据接入审批
新建数据接入审批:目前为一级审批,由系统管理员审批。
@@ -196,7 +196,7 @@ HIVE流向:

-### 5.2.2 新建数据消费审批
+#### 5.2.2 新建数据消费审批
新建数据消费审批:目前为一级审批,由业务负责人审批。
@@ -204,13 +204,13 @@ HIVE流向:

-# 6. 系统管理
+## 6 系统管理
角色为系统管理员的用户才可以使用此功能,他们可以创建、修改、删除用户:

-## 6.1 新建用户
+### 6.1 新建用户
具有系统管理员权限用户,可以进行创建新用户账号:
@@ -223,13 +223,13 @@ HIVE流向:

-## 6.2 删除用户
+### 6.2 删除用户
系统管理员可以对已创建的用户进行账户删除,删除后此账号将停止使用:

-## 6.3 修改用户
+### 6.3 修改用户
系统管理员可以修改已创建的账号:
@@ -239,7 +239,7 @@ HIVE流向:

-## 6.4 更改密码
+### 6.4 更改密码
用户可以修改账号密码,点击【修改密码】,输入旧密码和新密码,确认后此账号新密码将生效:
diff --git a/versioned_docs/version-0.11.0/modules/dataproxy/architecture.md
b/versioned_docs/version-0.11.0/modules/dataproxy/architecture.md
index a7d72f5..7f4f93c 100644
--- a/versioned_docs/version-0.11.0/modules/dataproxy/architecture.md
+++ b/versioned_docs/version-0.11.0/modules/dataproxy/architecture.md
@@ -1,14 +1,14 @@
---
title: Architecture
---
-# 1、intro
+## 1、intro
Inlong-dataProxy belongs to the inlong proxy layer and is used for data
collection, reception and forwarding. Through format conversion, the data is
converted into TDMsg1 format that can be cached and processed by the cache layer
InLong-dataProxy acts as a bridge from the InLong collection end to the
InLong buffer end. Dataproxy pulls the relationship between the business group
id and the corresponding topic name from the manager module, and internally
manages the producers of multiple topics
The overall architecture of inlong-dataproxy is based on Apache Flume. On
the basis of this project, inlong-bus expands the source layer and sink layer,
and optimizes disaster tolerance forwarding, which improves the stability of
the system.
-# 2、architecture
+## 2、architecture

@@ -16,7 +16,7 @@ title: Architecture
2. The channel layer has a selector, which is used to choose which type
of channel to go. If the memory is eventually full, the data will be processed.
3. The data of the channel layer will be forwarded through the sink
layer. The main purpose here is to convert the data to the TDMsg1 format and
push it to the cache layer (tube is more commonly used here)
-# 3、DataProxy support configuration instructions
+## 3、DataProxy support configuration instructions
DataProxy supports configurable source-channel-sink, and the configuration
method is the same as the configuration file structure of flume: