This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new a77e613cb8 [docs](tutorials) Fix deadlink of tutorials (#1007)
a77e613cb8 is described below
commit a77e613cb8de027c189cbb3708fd016b975873dd
Author: KassieZ <[email protected]>
AuthorDate: Mon Sep 2 10:52:37 2024 +0800
[docs](tutorials) Fix deadlink of tutorials (#1007)
---
.../tutorials/building-lakehouse/doris-hudi.md | 30 +++++++++++-----------
.../tutorials/building-lakehouse/doris-iceberg.md | 22 ++++++++--------
.../tutorials/building-lakehouse/doris-paimon.md | 16 ++++++------
.../tutorials/log-storage-analysis.md | 16 ++++++------
.../tutorials/building-lakehouse/doris-hudi.md | 2 +-
.../tutorials/building-lakehouse/doris-iceberg.md | 2 +-
.../tutorials/building-lakehouse/doris-paimon.md | 2 +-
gettingStarted/tutorials/log-storage-analysis.md | 18 ++++++-------
8 files changed, 54 insertions(+), 54 deletions(-)
diff --git
a/common_docs_zh/gettingStarted/tutorials/building-lakehouse/doris-hudi.md
b/common_docs_zh/gettingStarted/tutorials/building-lakehouse/doris-hudi.md
index 0468889bde..e54178a71c 100644
--- a/common_docs_zh/gettingStarted/tutorials/building-lakehouse/doris-hudi.md
+++ b/common_docs_zh/gettingStarted/tutorials/building-lakehouse/doris-hudi.md
@@ -55,7 +55,7 @@ Apache Doris 同样对 Apache Hudi 数据表的读取能力进行了增强:
本文将在 Docker 环境下,为读者介绍如何快速搭建 Apache Doris + Apache Hudi
的测试及演示环境,并对各功能操作进行演示,帮助读者快速入门。
-关于更多说明,请参阅 [Hudi Catalog](../../lakehouse/datalake-analytics/hudi)
+关于更多说明,请参阅 [Hudi Catalog](../../../lakehouse/datalake-analytics/hudi)
## 使用指南
@@ -88,7 +88,7 @@ Apache Doris 同样对 Apache Hudi 数据表的读取能力进行了增强:
3. 启动后,可以使用如下脚本,登陆 Spark 命令行或 Doris 命令行:
- ```
+ ```sql
-- Doris
sudo ./login-spark.sh
@@ -100,7 +100,7 @@ Apache Doris 同样对 Apache Hudi 数据表的读取能力进行了增强:
接下来先通过 Spark 生成 Hudi 的数据。如下方代码所示,集群中已经包含一张名为 `customer` 的 Hive 表,可以通过这张 Hive
表,创建一个 Hudi 表:
-```
+```sql
-- ./login-spark.sh
spark-sql> use default;
@@ -131,7 +131,7 @@ AS SELECT * FROM customer;
如下所示,Doris 集群中已经创建了名为 `hudi` 的 Catalog(可通过 `SHOW CATALOGS` 查看)。以下为该 Catalog
的创建语句:
-```
+```sql
-- 已经创建,无需再次执行
CREATE CATALOG `hive` PROPERTIES (
"type"="hms",
@@ -146,21 +146,21 @@ CREATE CATALOG `hive` PROPERTIES (
1. 手动刷新该 Catalog,对创建的 Hudi 表进行同步:
- ```
+ ```sql
-- ./login-doris.sh
doris> REFRESH CATALOG hive;
```
2. 使用 Spark 操作 Hudi 中的数据,都可以在 Doris 中实时可见,不需要再次刷新 Catalog。我们通过 Spark 分别给 COW 和
MOR 表插入一行数据:
- ```
+ ```sql
spark-sql> insert into customer_cow values (100, "Customer#000000100",
"jD2xZzi", "25-430-914-2194", 3471.59, "BUILDING", "cial ideas. final, furious
requests", 25);
spark-sql> insert into customer_mor values (100, "Customer#000000100",
"jD2xZzi", "25-430-914-2194", 3471.59, "BUILDING", "cial ideas. final, furious
requests", 25);
```
3. 通过 Doris 可以直接查询到最新插入的数据:
- ```
+ ```sql
doris> use hive.default;
doris> select * from customer_cow where c_custkey = 100;
doris> select * from customer_mor where c_custkey = 100;
@@ -168,14 +168,14 @@ CREATE CATALOG `hive` PROPERTIES (
4. 再通过 Spark 插入 c_custkey=32 已经存在的数据,即覆盖已有数据:
- ```
+ ```sql
spark-sql> insert into customer_cow values (32,
"Customer#000000032_update", "jD2xZzi", "25-430-914-2194", 3471.59, "BUILDING",
"cial ideas. final, furious requests", 15);
spark-sql> insert into customer_mor values (32,
"Customer#000000032_update", "jD2xZzi", "25-430-914-2194", 3471.59, "BUILDING",
"cial ideas. final, furious requests", 15);
```
5. 通过 Doris 可以查询更新后的数据:
- ```
+ ```sql
doris> select * from customer_cow where c_custkey = 32;
+-----------+---------------------------+-----------+-----------------+-----------+--------------+-------------------------------------+-------------+
| c_custkey | c_name | c_address | c_phone |
c_acctbal | c_mktsegment | c_comment | c_nationkey |
@@ -194,7 +194,7 @@ CREATE CATALOG `hive` PROPERTIES (
Incremental Read 是 Hudi 提供的功能特性之一,通过 Incremental
Read,用户可以获取指定时间范围的增量数据,从而实现对数据的增量处理。对此,Doris 可对插入 `c_custkey=100`
后的变更数据进行查询。如下所示,我们插入了一条 `c_custkey=32` 的数据:
-```
+```sql
doris> select * from customer_cow@incr('beginTime'='20240603015018572');
+-----------+---------------------------+-----------+-----------------+-----------+--------------+-------------------------------------+-------------+
| c_custkey | c_name | c_address | c_phone |
c_acctbal | c_mktsegment | c_comment | c_nationkey |
@@ -216,7 +216,7 @@ spark-sql> select * from hudi_table_changes('customer_mor',
'latest_state', '202
Doris 支持查询指定快照版本的 Hudi 数据,从而实现对数据的 Time Travel 功能。首先,可以通过 Spark 查询两张 Hudi
表的提交历史:
-```
+```sql
spark-sql> call show_commits(table => 'customer_cow', limit => 10);
20240603033556094 20240603033558249 commit 448833
0 1 1 183 0 0
20240603015444737 20240603015446588 commit 450238
0 1 1 202 1 0
@@ -234,7 +234,7 @@ spark-sql> call show_commits(table => 'customer_mor', limit
=> 10);
> 注:Time Travel 语法暂时不支持新优化器,需要先执行 set
> enable_nereids_planner=false;关闭新优化器,该问题将会在后续版本中修复。
-```
+```sql
doris> select * from customer_cow for time as of '20240603015018572' where
c_custkey = 32 or c_custkey = 100;
+-----------+--------------------+---------------------------------------+-----------------+-----------+--------------+--------------------------------------------------+-------------+
| c_custkey | c_name | c_address |
c_phone | c_acctbal | c_mktsegment | c_comment
| c_nationkey |
@@ -263,7 +263,7 @@ Apache Hudi 中的数据大致可以分为两类 —— 基线数据和增量数
为验证该优化思路,我们通过 EXPLAIN 语句来查看一个下方示例的查询中,分别有多少基线数据和增量数据。对于 COW 表来说,所有 101
个数据分片均为是基线数据(`hudiNativeReadSplits=101/101`),因此 COW 表全部可直接通过 Doris Parquet
Reader 进行读取,因此可获得最佳的查询性能。对于 ROW
表,大部分数据分片是基线数据(`hudiNativeReadSplits=100/101`),一个分片数为增量数据,基本也能够获得较好的查询性能。
-```
+```sql
-- COW table is read natively
doris> explain select * from customer_cow where c_custkey = 32;
| 0:VHUDI_SCAN_NODE(68) |
@@ -289,7 +289,7 @@ doris> explain select * from customer_mor where c_custkey =
32;
可以通过 Spark 进行一些删除操作,进一步观察 Hudi 基线数据和增量数据的变化:
-```
+```sql
-- Use delete statement to see more differences
spark-sql> delete from customer_cow where c_custkey = 64;
doris> explain select * from customer_cow where c_custkey = 64;
@@ -300,7 +300,7 @@ doris> explain select * from customer_mor where c_custkey =
64;
此外,还可以通过分区条件进行分区裁剪,从而进一步减少数据量,以提升查询速度。如下示例中,通过分区条件 `c_nationkey=15`
进行分区裁减,使得查询请求只需要访问一个分区(`partition=1/26`)的数据即可。
-```
+```sql
-- customer_xxx is partitioned by c_nationkey, we can use the partition column
to prune data
doris> explain select * from customer_mor where c_custkey = 64 and c_nationkey
= 15;
| 0:VHUDI_SCAN_NODE(68) |
diff --git
a/common_docs_zh/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
b/common_docs_zh/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
index db37ca6b31..3cc43ab17e 100644
---
a/common_docs_zh/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
+++
b/common_docs_zh/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
@@ -55,7 +55,7 @@ Apache Doris 对 Iceberg 多项核心特性提供了原生支持:
未来,Apache Iceberg 将作为 Apache Doris 的原生表引擎之一,提供更加完善的湖格式数据的分析、管理功能。Apache Doris
也将逐步支持包括 Update/Delete/Merge、写回时排序、增量数据读取、元数据管理等 Apache Iceberg
更多高级特性,共同构建统一、高性能、实时的湖仓平台。
-关于更多说明,请参阅 [Iceberg Catalog](../../lakehouse/datalake-analytics/iceberg.md)
+关于更多说明,请参阅 [Iceberg Catalog](../../../lakehouse/datalake-analytics/iceberg.md)
## 使用指南
@@ -81,7 +81,7 @@ Apache Doris 对 Iceberg 多项核心特性提供了原生支持:
2. 启动后,可以使用如下脚本,登陆 Doris 命令行:
- ```
+ ```sql
-- login doris
bash ./start_doris_client.sh
```
@@ -90,7 +90,7 @@ Apache Doris 对 Iceberg 多项核心特性提供了原生支持:
首先登陆 Doris 命令行后,Doris 集群中已经创建了名为 Iceberg 的 Catalog(可通过 `SHOW CATALOGS`/`SHOW
CREATE CATALOG iceberg` 查看)。以下为该 Catalog 的创建语句:
-```
+```sql
-- 已创建,无需执行
CREATE CATALOG `iceberg` PROPERTIES (
"type" = "iceberg",
@@ -105,7 +105,7 @@ CREATE CATALOG `iceberg` PROPERTIES (
在 Iceberg Catalog 创建数据库和 Iceberg 表:
-```
+```sql
mysql> SWITCH iceberg;
Query OK, 0 rows affected (0.00 sec)
@@ -133,7 +133,7 @@ Query OK, 0 rows affected (0.15 sec)
向 Iceberg 表中插入数据:
-```
+```sql
mysql> INSERT INTO iceberg.nyc.taxis
VALUES
(1, 1000371, 1.8, 15.32, 'N', '2024-01-01 9:15:23'),
@@ -156,7 +156,7 @@ Query OK, 6 rows affected (0.25 sec)
- 简单查询
- ```
+ ```sql
mysql> SELECT * FROM iceberg.nyc.taxis;
+-----------+---------+---------------+-------------+--------------------+----------------------------+
| vendor_id | trip_id | trip_distance | fare_amount |
store_and_fwd_flag | ts |
@@ -182,7 +182,7 @@ Query OK, 6 rows affected (0.25 sec)
- 分区剪裁
- ```
+ ```sql
mysql> SELECT * FROM iceberg.nyc.taxis where vendor_id = 2 and ts >=
'2024-01-01' and ts < '2024-01-02';
+-----------+---------+---------------+-------------+--------------------+----------------------------+
| vendor_id | trip_id | trip_distance | fare_amount |
store_and_fwd_flag | ts |
@@ -219,7 +219,7 @@ Query OK, 6 rows affected (0.25 sec)
我们先再次插入几行数据:
-```
+```sql
INSERT INTO iceberg.nyc.taxis VALUES (1, 1000375, 8.8, 55.55, 'Y', '2024-01-01
8:10:22'), (3, 1000376, 7.4, 32.35, 'N', '2024-01-02 1:14:45');
Query OK, 2 rows affected (0.17 sec)
{'status':'COMMITTED', 'txnId':'10086'}
@@ -240,7 +240,7 @@ mysql> SELECT * FROM iceberg.nyc.taxis;
使用 `iceberg_meta` 表函数查询表的快照信息:
-```
+```sql
mysql> select * from iceberg_meta("table" = "iceberg.nyc.taxis", "query_type"
= "snapshots");
+---------------------+---------------------+---------------------+-----------+-----------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| committed_at | snapshot_id | parent_id | operation
| manifest_list
| summary
|
@@ -253,7 +253,7 @@ mysql> select * from iceberg_meta("table" =
"iceberg.nyc.taxis", "query_type" =
使用 `FOR VERSION AS OF` 语句查询指定快照:
-```
+```sql
mysql> SELECT * FROM iceberg.nyc.taxis FOR VERSION AS OF 8483933166442433486;
+-----------+---------+---------------+-------------+--------------------+----------------------------+
| vendor_id | trip_id | trip_distance | fare_amount | store_and_fwd_flag | ts
|
@@ -281,7 +281,7 @@ mysql> SELECT * FROM iceberg.nyc.taxis FOR VERSION AS OF
4726331391239920914;
使用 `FOR TIME AS OF` 语句查询指定快照:
-```
+```sql
mysql> SELECT * FROM iceberg.nyc.taxis FOR TIME AS OF "2024-07-29 03:38:23";
+-----------+---------+---------------+-------------+--------------------+----------------------------+
| vendor_id | trip_id | trip_distance | fare_amount | store_and_fwd_flag | ts
|
diff --git
a/common_docs_zh/gettingStarted/tutorials/building-lakehouse/doris-paimon.md
b/common_docs_zh/gettingStarted/tutorials/building-lakehouse/doris-paimon.md
index b9c2694a9e..5c7b7b98b5 100644
--- a/common_docs_zh/gettingStarted/tutorials/building-lakehouse/doris-paimon.md
+++ b/common_docs_zh/gettingStarted/tutorials/building-lakehouse/doris-paimon.md
@@ -54,7 +54,7 @@ Apache Paimon 是一种数据湖格式,并创新性地将数据湖格式和 LS
本文将会再 Docker 环境中,为读者讲解如何快速搭建 Apache Doris + Apache Paimon 测试 &
演示环境,并展示各功能的使用操作。
-关于更多说明,请参阅 [Paimon Catalog](../../lakehouse/datalake-analytics/paimon.md)
+关于更多说明,请参阅 [Paimon Catalog](../../../lakehouse/datalake-analytics/paimon.md)
## 使用指南
@@ -79,7 +79,7 @@ Apache Paimon 是一种数据湖格式,并创新性地将数据湖格式和 LS
2. 启动后,可以使用如下脚本,登陆 Flink 命令行或 Doris 命令行:
- ```
+ ```sql
-- login flink
bash ./start_flink_client.sh
@@ -91,7 +91,7 @@ Apache Paimon 是一种数据湖格式,并创新性地将数据湖格式和 LS
首先登陆 Flink 命令行后,可以看到一张预构建的表。表中已经包含一些数据,我们可以通过 Flink SQL 进行查看。
-```
+```sql
Flink SQL> use paimon.db_paimon;
[INFO] Execute statement succeed.
@@ -158,7 +158,7 @@ Flink SQL> select * from customer order by c_custkey limit
4;
如下所示,Doris 集群中已经创建了名为 `paimon` 的 Catalog(可通过 SHOW CATALOGS 查看)。以下为该 Catalog
的创建语句:
-```
+```sql
-- 已创建,无需执行
CREATE CATALOG `paimon` PROPERTIES (
"type" = "paimon",
@@ -172,7 +172,7 @@ CREATE CATALOG `paimon` PROPERTIES (
你可登录到 Doris 中查询 Paimon 的数据:
-```
+```sql
mysql> use paimon.db_paimon;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
@@ -202,7 +202,7 @@ mysql> select * from customer order by c_custkey limit 4;
我们可以通过 Flink SQL 更新 Paimon 表中的数据:
-```
+```sql
Flink SQL> update customer set c_address='c_address_update' where c_nationkey
= 1;
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:
@@ -211,7 +211,7 @@ Job ID: ff838b7b778a94396b332b0d93c8f7ac
等 Flink SQL 执行完毕后,在 Doris 中可直接查看到最新的数据:
-```
+```sql
mysql> select * from customer where c_nationkey=1 limit 2;
+-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
| c_custkey | c_name | c_address | c_nationkey | c_phone
| c_acctbal | c_mktsegment | c_comment
|
@@ -235,7 +235,7 @@ mysql> select * from customer where c_nationkey=1 limit 2;
对于基线数据来说,Apache Paimon 在 0.6 版本中引入 Primary Key Table Read Optimized
功能后,使得查询引擎可以直接访问底层的 Parquet/ORC 文件,大幅提升了基线数据的读取效率。对于尚未合并的增量数据(INSERT、UPDATE 或
DELETE 所产生的数据增量)来说,可以通过 Merge-on-Read 的方式进行读取。此外,Paimon 在 0.8 版本中还引入的 Deletion
Vector 功能,能够进一步提升查询引擎对增量数据的读取效率。
Apache Doris 支持通过原生的 Reader 读取 Deletion Vector 并进行 Merge on Read,我们通过 Doris 的
EXPLAIN 语句,来演示在一个查询中,基线数据和增量数据的查询方式。
-```
+```sql
mysql> explain verbose select * from customer where c_nationkey < 3;
+------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)
|
diff --git a/common_docs_zh/gettingStarted/tutorials/log-storage-analysis.md
b/common_docs_zh/gettingStarted/tutorials/log-storage-analysis.md
index 9d13efc863..8c61bf3124 100644
--- a/common_docs_zh/gettingStarted/tutorials/log-storage-analysis.md
+++ b/common_docs_zh/gettingStarted/tutorials/log-storage-analysis.md
@@ -152,7 +152,7 @@ Apache Doris 对 Flexible Schema 的日志数据提供了几个方面的支持
### 第 2 步:部署集群
-完成资源评估后,可以开始部署 Apache Doris 集群,推荐在物理机及虚拟机环境中进行部署。手动部署集群,可参考
[手动部署](../install/cluster-deployment/standard-deployment.md)。
+完成资源评估后,可以开始部署 Apache Doris 集群,推荐在物理机及虚拟机环境中进行部署。手动部署集群,可参考
[手动部署](../../install/cluster-deployment/standard-deployment)。
另,推荐使用 SelectDB Enterprise 推出的 Cluster Manager 工具部署集群,以降低整体部署成本。更多关于 Cluster
Manager 的信息,可参考以下文档:
@@ -177,7 +177,7 @@ Apache Doris 对 Flexible Schema 的日志数据提供了几个方面的支持
| `autobucket_min_buckets = 10` | 将自动分桶的最小分桶数从
1 调大到 10,避免日志量增加时分桶不够。 |
| `max_backend_heartbeat_failure_tolerance_count = 10` | 日志场景下 BE
服务器压力较大,可能短时间心跳超时,因此将容忍次数从 1 调大到 10。 |
-更多关于 FE 配置项的信息,可参考 [FE 配置项](../admin-manual/config/fe-config.md)。
+更多关于 FE 配置项的信息,可参考 [FE 配置项](../../admin-manual/config/fe-config)。
**优化 BE 配置**
@@ -206,7 +206,7 @@ Apache Doris 对 Flexible Schema 的日志数据提供了几个方面的支持
| 其他 | `string_type_length_soft_limit_bytes = 10485760` | 将
String 类型数据的长度限制调高至 10 MB。 |
| - | `trash_file_expire_time_sec = 300`
`path_gc_check_interval_second = 900` `path_scan_interval_second = 900` |
调快垃圾文件的回收时间。 |
-更多关于 BE 配置项的信息,可参考 [BE 配置项](../admin-manual/config/be-config.md)。
+更多关于 BE 配置项的信息,可参考 [BE 配置项](../../admin-manual/config/be-config)。
### 第 4 步:建表
@@ -215,13 +215,13 @@ Apache Doris 对 Flexible Schema 的日志数据提供了几个方面的支持
**配置分区分桶参数**
- 分区时,按照以下说明配置:
-- 使用时间字段上的 [Range
分区](https://doris.apache.org/zh-CN/docs/dev/table-design/data-partition/#range-%E5%88%86%E5%8C%BA),并开启
[动态分区](https://doris.apache.org/zh-CN/docs/dev/table-design/data-partition?_highlight=%E8%87%AA%E5%8A%A8&_highlight=%E5%88%86&_highlight=%E6%A1%B6#%E5%8A%A8%E6%80%81%E5%88%86%E5%8C%BA),按天自动管理分区。
+- 使用时间字段上的 [Range
分区](../../table-design/data-partition/#range-%E5%88%86%E5%8C%BA),并开启
[动态分区](../../table-design/data-partition?_highlight=%E8%87%AA%E5%8A%A8&_highlight=%E5%88%86&_highlight=%E6%A1%B6#%E5%8A%A8%E6%80%81%E5%88%86%E5%8C%BA),按天自动管理分区。
- 使用 Datetime 类型的时间字段作为 Key,在查询最新 N 条日志时有数倍加速。
- 分桶时,按照以下说明配置:
- 分桶数量大致为集群磁盘总数的 3 倍。
- 使用 Random 策略,配合写入时的 Single Tablet 导入,可以提升批量(Batch)写入的效率。
-更多关于分区分桶的信息,可参考 [分区分桶](../table-design/data-partition.md)。
+更多关于分区分桶的信息,可参考 [分区分桶](../../table-design/data-partition)。
**配置 Compaction 参数**
@@ -370,7 +370,7 @@ output {
./bin/logstash -f logstash_demo.conf
```
-更多关于 Logstash 配置和使用的说明,可参考 [Logstash Doris Output
Plugin](../ecosystem/logstash.md)。
+更多关于 Logstash 配置和使用的说明,可参考 [Logstash Doris Output
Plugin](../../ecosystem/logstash)。
**对接 Filebeat**
@@ -446,7 +446,7 @@ chmod +x filebeat-doris-1.0.0
./filebeat-doris-1.0.0 -c filebeat_demo.yml
```
-更多关于 Filebeat 配置和使用的说明,可参考 [Beats Doris Output Plugin](../ecosystem/beats.md)。
+更多关于 Filebeat 配置和使用的说明,可参考 [Beats Doris Output Plugin](../../ecosystem/beats)。
**对接 Kafka**
@@ -482,7 +482,7 @@ FROM KAFKA (
SHOW ROUTINE LOAD;
```
-更多关于 Kafka 配置和使用的说明,可参考 [Routine
Load](../data-operate/import/routine-load-manual.md)。
+更多关于 Kafka 配置和使用的说明,可参考 [Routine
Load](../../data-operate/import/routine-load-manual)。
**使用自定义程序采集日志**
diff --git a/gettingStarted/tutorials/building-lakehouse/doris-hudi.md
b/gettingStarted/tutorials/building-lakehouse/doris-hudi.md
index f92a3ec698..ee61536e4b 100644
--- a/gettingStarted/tutorials/building-lakehouse/doris-hudi.md
+++ b/gettingStarted/tutorials/building-lakehouse/doris-hudi.md
@@ -55,7 +55,7 @@ With Apache Doris' high-performance query execution and
Apache Hudi's real-time
This article will introduce readers to how to quickly set up a test and
demonstration environment for Apache Doris + Apache Hudi in a Docker
environment, and demonstrate various operations to help readers get started
quickly.
-For more information, please refer to [Hudi
Catalog](../../lakehouse/datalake-analytics/hudi.md)
+For more information, please refer to [Hudi
Catalog](../../../lakehouse/datalake-analytics/hudi)
## User Guide
diff --git a/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
b/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
index f58dead95f..3a6159407b 100644
--- a/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
+++ b/gettingStarted/tutorials/building-lakehouse/doris-iceberg.md
@@ -53,7 +53,7 @@ Users can quickly build an efficient Data Lakehouse solution
based on Apache Dor
In the future, Apache Iceberg will serve as one of the native table engines
for Apache Doris, providing more comprehensive analysis and management
functions for lake-formatted data. Apache Doris will also gradually support
more advanced features of Apache Iceberg, including Update/Delete/Merge,
sorting during write-back, incremental data reading, metadata management, etc.,
to jointly build a unified, high-performance, real-time data lake platform.
-For more information, please refer to [Iceberg
Catalog](../../lakehouse/datalake-analytics/iceberg.md)
+For more information, please refer to [Iceberg
Catalog](../../../lakehouse/datalake-analytics/iceberg)
## User Guide
diff --git a/gettingStarted/tutorials/building-lakehouse/doris-paimon.md
b/gettingStarted/tutorials/building-lakehouse/doris-paimon.md
index 26074f1ca3..26aba30e96 100644
--- a/gettingStarted/tutorials/building-lakehouse/doris-paimon.md
+++ b/gettingStarted/tutorials/building-lakehouse/doris-paimon.md
@@ -54,7 +54,7 @@ In the future, Apache Doris will gradually support more
advanced features of Apa
This article will explain how to quickly set up an Apache Doris + Apache
Paimon testing & demonstration environment in a Docker environment and
demonstrate the usage of various features.
-For more information, please refer to [Paimon
Catalog](../../lakehouse/datalake-analytics/paimon.md)
+For more information, please refer to [Paimon
Catalog](../../../lakehouse/datalake-analytics/paimon.md)
## User Guide
diff --git a/gettingStarted/tutorials/log-storage-analysis.md
b/gettingStarted/tutorials/log-storage-analysis.md
index 51ff5cead1..b44ccf39d4 100644
--- a/gettingStarted/tutorials/log-storage-analysis.md
+++ b/gettingStarted/tutorials/log-storage-analysis.md
@@ -175,7 +175,7 @@ Refer to the following table to learn about the values of
indicators in the exam
### Step 2: Deploy the cluster
-After estimating the resources, you need to deploy the cluster. It is
recommended to deploy in both physical and virtual environments manually. For
manual deployment, refer to [Manual
Deployment](../install/cluster-deployment/standard-deployment.md).
+After estimating the resources, you need to deploy the cluster. It is
recommended to deploy in both physical and virtual environments manually. For
manual deployment, refer to [Manual
Deployment](../../install/cluster-deployment/standard-deployment).
Alternatively, it is recommended to use VeloDB Manager provided by VeloDB
Enterprise to deploy the cluster, reducing overall deployment costs. For more
information about the VeloDB Manager, please refer to the following documents:
@@ -201,7 +201,7 @@ You can find FE configuration fields in `fe/conf/fe.conf`.
Refer to the followin
| `autobucket_min_buckets = 10` | Increase the
minimum number of automatically bucketed buckets from 1 to 10 to avoid
insufficient buckets when the log volume increases. |
| `max_backend_heartbeat_failure_tolerance_count = 10` | In log
scenarios, the BE server may experience high pressure, leading to short-term
timeouts, so increase the tolerance count from 1 to 10. |
-For more information, refer to [FE
Configuration](../admin-manual/config/fe-config.md).
+For more information, refer to [FE
Configuration](../../admin-manual/config/fe-config).
**Optimize BE configurations**
@@ -231,7 +231,7 @@ You can find BE configuration fields in `be/conf/be.conf`.
Refer to the followin
| - | `trash_file_expire_time_sec = 300`
`path_gc_check_interval_second = 900` `path_scan_interval_second = 900` |
Accelerate the recycling of trash files. |
-For more information, refer to [BE
Configuration](../admin-manual/config/be-config.md).
+For more information, refer to [BE
Configuration](../../admin-manual/config/be-config).
### Step 4: Create tables
@@ -241,7 +241,7 @@ Due to the distinct characteristics of both writing and
querying log data, it is
- For data partitioning:
- - Enable [range
partitioning](https://doris.apache.org/docs/table-design/data-partition#range-partition)
with [dynamic
partitions](https://doris.apache.org/docs/table-design/data-partition#dynamic-partition)
managed automatically by day.
+ - Enable [range
partitioning](../../table-design/data-partition#range-partition) with [dynamic
partitions](../../table-design/data-partition#dynamic-partition) managed
automatically by day.
- Use a field in the DATETIME type as the key for accelerated retrieval of
the latest N log entries.
@@ -251,7 +251,7 @@ Due to the distinct characteristics of both writing and
querying log data, it is
- Use the Random strategy to optimize batch writing efficiency when paired
with single tablet imports.
-For more information, refer to [Data
Partitioning](../table-design/data-partition.md).
+For more information, refer to [Data
Partitioning](../../table-design/data-partition).
**Configure compaction fileds**
@@ -402,7 +402,7 @@ output {
./bin/logstash -f logstash_demo.conf
```
-For more information about the Logstash Doris Output plugin, see [Logstash
Doris Output Plugin](../ecosystem/logstash.md).
+For more information about the Logstash Doris Output plugin, see [Logstash
Doris Output Plugin](../../ecosystem/logstash).
**Integrating Filebeat**
@@ -470,7 +470,7 @@ headers:
./filebeat-doris-1.0.0 -c filebeat_demo.yml
```
-For more information about Filebeat, refer to [Beats Doris Output
Plugin](../ecosystem/beats.md).
+For more information about Filebeat, refer to [Beats Doris Output
Plugin](../../ecosystem/beats).
**Integrating Kafka**
@@ -478,7 +478,7 @@ Write JSON formatted logs to Kafka's message queue, create
a Kafka Routine Load,
You can refer to the example below, where `property.*` represents Librdkafka
client-related configurations and needs to be adjusted according to the actual
Kafka cluster situation.
-```SQL
+```sql
CREATE ROUTINE LOAD load_log_kafka ON log_db.log_table
COLUMNS(ts, clientip, request, status, size)
PROPERTIES (
@@ -503,7 +503,7 @@ FROM KAFKA (
<br />SHOW ROUTINE LOAD;
```
-For more information about Kafka, see [Routine
Load](../data-operate/import/routine-load-manual.md)。
+For more information about Kafka, see [Routine
Load](../../data-operate/import/routine-load-manual)。
**Using customized programs to collect logs**
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]