(doris-website) branch master updated: [doc] Add Before You Start the POC guide to 3.x and 4.x (#3470)

dataroaring Tue, 17 Mar 2026 05:45:00 -0700

This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 058922f3879 [doc] Add Before You Start the POC guide to 3.x and 4.x 
(#3470)
058922f3879 is described below

commit 058922f3879554fe63573e16df24777daf19605b
Author: Yongqiang YANG <[email protected]>
AuthorDate: Tue Mar 17 05:44:44 2026 -0700

    [doc] Add Before You Start the POC guide to 3.x and 4.x (#3470)
    
    ## Summary
    - Add `before-you-start-the-poc.md` (EN + ZH) to versioned docs for 3.x
    and 4.x
    - Register the new page in both `version-3.x-sidebars.json` and
    `version-4.x-sidebars.json`
    - Content is identical to the `current` version from the `before_poc`
    branch
    
    ## Test plan
    - [ ] Verify the page renders correctly under 3.x docs
    - [ ] Verify the page renders correctly under 4.x docs
    - [ ] Verify sidebar shows "Before You Start the POC" in Getting Started
    for both versions
    - [ ] Verify both EN and ZH versions display properly
    
    🤖 Generated with [Claude Code](https://claude.com/claude-code)
    
    ---------
    
    Co-authored-by: Claude Opus 4.6 <[email protected]>
---
 .../gettingStarted/before-you-start-the-poc.md     | 115 +++++++++++++++++++++
 .../gettingStarted/before-you-start-the-poc.md     | 115 +++++++++++++++++++++
 .../gettingStarted/before-you-start-the-poc.md     | 115 +++++++++++++++++++++
 .../gettingStarted/before-you-start-the-poc.md     | 115 +++++++++++++++++++++
 versioned_sidebars/version-3.x-sidebars.json       |   1 +
 versioned_sidebars/version-4.x-sidebars.json       |   1 +
 6 files changed, 462 insertions(+)

diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/gettingStarted/before-you-start-the-poc.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/gettingStarted/before-you-start-the-poc.md
new file mode 100644
index 00000000000..bf1a6570d66
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/gettingStarted/before-you-start-the-poc.md
@@ -0,0 +1,115 @@
+---
+{
+    "title": "POC 前必读",
+    "language": "zh-CN",
+    "description": "新用户在 Apache Doris 建表设计、数据导入和查询调优中常见的问题。",
+    "sidebar_label": "POC 前必读"
+}
+---
+
+# POC 前必读
+
+本文档汇总了新用户常见的问题，旨在加速 POC 进程。
+
+## 建表设计
+
+在 Doris 中建表涉及四个影响导入和查询性能的决策。
+
+### 数据模型
+
+| 数据特征 | 使用 | 原因 |
+|---|---|---|
+| 仅追加（日志、事件、事实表） | **Duplicate Key**（默认） | 保留所有行。查询性能最好。 |
+| 按主键更新（CDC、Upsert） | **Unique Key** | 新行按相同 Key 替换旧行。 |
+| 预聚合指标（PV、UV、汇总） | **Aggregate Key** | 写入时按 SUM/MAX/MIN 合并行。 |
+
+**Duplicate Key 适用于大多数场景。**详见[数据模型概述](../table-design/data-model/overview)。
+
+### Sort Key（排序键）
+
+将最常用于过滤的列放在最前面，定长类型（INT、BIGINT、DATE）放在 VARCHAR 之前。Doris 在排序键的前 36 
字节上构建[前缀索引](../table-design/index/prefix-index)，但遇到 VARCHAR 
会立即截断。其他需要快速过滤的列可添加倒排索引。
+
+### 分区
+
+如果有时间列，使用 `AUTO PARTITION BY RANGE(date_trunc(time_col, 'day'))` 
启用[分区裁剪](../table-design/data-partitioning/auto-partitioning)。Doris 会自动跳过无关分区。
+
+### 分桶
+
+默认是 **Random 分桶**（推荐用于 Duplicate Key 表）。如果频繁按某列过滤或 JOIN，使用 `DISTRIBUTED BY 
HASH(该列)`。详见[数据分桶](../table-design/data-partitioning/data-bucketing)。
+
+**如何选择分桶数：**
+
+1. **设为 BE 数量的整数倍**，确保数据均匀分布。后续扩容 BE 时，查询通常涉及多个分区，性能不会受影响。
+2. **尽可能少**，避免小文件。
+3. **每个分桶的压缩后数据 ≤ 20 GB**（Unique Key 表 ≤ 10 GB）。可通过 `SHOW TABLETS FROM 
your_table` 查看。
+4. **每个分区不超过 128 个分桶。**需要更多时优先考虑分区。极端情况下上限为 1024，但生产环境中很少需要。
+
+### 建表模板
+
+#### 日志 / 事件分析
+
+```sql
+CREATE TABLE app_logs
+(
+    log_time      DATETIME    NOT NULL,
+    log_level     VARCHAR(10),
+    service_name  VARCHAR(50),
+    trace_id      VARCHAR(64),
+    message       STRING,
+    INDEX idx_message (message) USING INVERTED PROPERTIES("parser" = "unicode")
+)
+AUTO PARTITION BY RANGE(date_trunc(`log_time`, 'day'))
+()
+DISTRIBUTED BY RANDOM BUCKETS 10;
+```
+
+#### 实时看板与 Upsert（CDC）
+
+```sql
+CREATE TABLE user_profiles
+(
+    user_id       BIGINT      NOT NULL,
+    username      VARCHAR(50),
+    email         VARCHAR(100),
+    status        TINYINT,
+    updated_at    DATETIME
+)
+UNIQUE KEY(user_id)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10;
+```
+
+#### 指标聚合
+
+```sql
+CREATE TABLE site_metrics
+(
+    dt            DATE        NOT NULL,
+    site_id       INT         NOT NULL,
+    pv            BIGINT      SUM DEFAULT '0',
+    uv            BIGINT      MAX DEFAULT '0'
+)
+AGGREGATE KEY(dt, site_id)
+AUTO PARTITION BY RANGE(date_trunc(`dt`, 'day'))
+()
+DISTRIBUTED BY HASH(site_id) BUCKETS 10;
+```
+
+## 性能陷阱
+
+### 导入
+
+- **批量数据不要用 `INSERT INTO VALUES`。**请使用 [Stream 
Load](../data-operate/import/import-way/stream-load-manual) 或 [Broker 
Load](../data-operate/import/import-way/broker-load-manual)。详见[导入概述](../data-operate/import/load-manual)。
+- **优先在客户端合并写入。**高频小批次导入导致版本堆积。如不可行，使用 [Group 
Commit](../data-operate/import/group-commit-manual)。
+- **将大型导入拆分为小批次。**长时间运行的导入失败后必须从头重试。使用 INSERT INTO SELECT 配合 S3 TVF 实现增量导入。
+- **Random 分桶的 Duplicate Key 表启用 `load_to_single_tablet`**，减少写放大。
+
+详见[导入最佳实践](../data-operate/import/load-best-practices)。
+
+### 查询
+
+- **避免数据倾斜。**通过 `SHOW TABLETS` 检查 tablet 大小。差异明显时切换为 Random 分桶或选择基数更高的分桶列。
+- **不要分桶过多。**过多的小 tablet 会产生调度开销，查询性能最多可下降 50%。参见[分桶](#分桶)了解分桶数选择。
+- **不要分桶过少。**过少的 tablet 会限制 CPU 并行度。参见[分桶](#分桶)了解分桶数选择。
+- **正确设置排序键。**与 PostgreSQL 等系统不同，Doris 仅对排序键的前 36 字节建立索引，且遇到 VARCHAR 
会立即截断。超出前缀范围的列无法从排序键受益，需添加倒排索引。参见 [Sort Key（排序键）](#sort-key排序键)。
+
+诊断慢查询请使用 [Query 
Profile](../admin-manual/open-api/fe-http/query-profile-action)。
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/gettingStarted/before-you-start-the-poc.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/gettingStarted/before-you-start-the-poc.md
new file mode 100644
index 00000000000..11b5c960f97
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/gettingStarted/before-you-start-the-poc.md
@@ -0,0 +1,115 @@
+---
+{
+    "title": "POC 前必读",
+    "language": "zh-CN",
+    "description": "新用户在 Apache Doris 建表设计、数据导入和查询调优中常见的问题。",
+    "sidebar_label": "POC 前必读"
+}
+---
+
+# POC 前必读
+
+本文档汇总了新用户常见的问题，旨在加速 POC 进程。
+
+## 建表设计
+
+在 Doris 中建表涉及四个影响导入和查询性能的决策。
+
+### 数据模型
+
+| 数据特征 | 使用 | 原因 |
+|---|---|---|
+| 仅追加（日志、事件、事实表） | **Duplicate Key**（默认） | 保留所有行。查询性能最好。 |
+| 按主键更新（CDC、Upsert） | **Unique Key** | 新行按相同 Key 替换旧行。 |
+| 预聚合指标（PV、UV、汇总） | **Aggregate Key** | 写入时按 SUM/MAX/MIN 合并行。 |
+
+**Duplicate Key 适用于大多数场景。**详见[数据模型概述](../table-design/data-model/overview)。
+
+### Sort Key（排序键）
+
+将最常用于过滤的列放在最前面，定长类型（INT、BIGINT、DATE）放在 VARCHAR 之前。Doris 在排序键的前 36 
字节上构建[前缀索引](../table-design/index/prefix-index)，但遇到 VARCHAR 
会立即截断。其他需要快速过滤的列可添加[倒排索引](../table-design/index/inverted-index/overview)。
+
+### 分区
+
+如果有时间列，使用 `AUTO PARTITION BY RANGE(date_trunc(time_col, 'day'))` 
启用[分区裁剪](../table-design/data-partitioning/auto-partitioning)。Doris 会自动跳过无关分区。
+
+### 分桶
+
+默认是 **Random 分桶**（推荐用于 Duplicate Key 表）。如果频繁按某列过滤或 JOIN，使用 `DISTRIBUTED BY 
HASH(该列)`。详见[数据分桶](../table-design/data-partitioning/data-bucketing)。
+
+**如何选择分桶数：**
+
+1. **设为 BE 数量的整数倍**，确保数据均匀分布。后续扩容 BE 时，查询通常涉及多个分区，性能不会受影响。
+2. **尽可能少**，避免小文件。
+3. **每个分桶的压缩后数据 ≤ 20 GB**（Unique Key 表 ≤ 10 GB）。可通过 `SHOW TABLETS FROM 
your_table` 查看。
+4. **每个分区不超过 128 个分桶。**需要更多时优先考虑分区。极端情况下上限为 1024，但生产环境中很少需要。
+
+### 建表模板
+
+#### 日志 / 事件分析
+
+```sql
+CREATE TABLE app_logs
+(
+    log_time      DATETIME    NOT NULL,
+    log_level     VARCHAR(10),
+    service_name  VARCHAR(50),
+    trace_id      VARCHAR(64),
+    message       STRING,
+    INDEX idx_message (message) USING INVERTED PROPERTIES("parser" = "unicode")
+)
+AUTO PARTITION BY RANGE(date_trunc(`log_time`, 'day'))
+()
+DISTRIBUTED BY RANDOM BUCKETS 10;
+```
+
+#### 实时看板与 Upsert（CDC）
+
+```sql
+CREATE TABLE user_profiles
+(
+    user_id       BIGINT      NOT NULL,
+    username      VARCHAR(50),
+    email         VARCHAR(100),
+    status        TINYINT,
+    updated_at    DATETIME
+)
+UNIQUE KEY(user_id)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10;
+```
+
+#### 指标聚合
+
+```sql
+CREATE TABLE site_metrics
+(
+    dt            DATE        NOT NULL,
+    site_id       INT         NOT NULL,
+    pv            BIGINT      SUM DEFAULT '0',
+    uv            BIGINT      MAX DEFAULT '0'
+)
+AGGREGATE KEY(dt, site_id)
+AUTO PARTITION BY RANGE(date_trunc(`dt`, 'day'))
+()
+DISTRIBUTED BY HASH(site_id) BUCKETS 10;
+```
+
+## 性能陷阱
+
+### 导入
+
+- **批量数据不要用 `INSERT INTO VALUES`。**请使用 [Stream 
Load](../data-operate/import/import-way/stream-load-manual) 或 [Broker 
Load](../data-operate/import/import-way/broker-load-manual)。详见[导入概述](../data-operate/import/load-manual)。
+- **优先在客户端合并写入。**高频小批次导入导致版本堆积。如不可行，使用 [Group 
Commit](../data-operate/import/group-commit-manual)。
+- **将大型导入拆分为小批次。**长时间运行的导入失败后必须从头重试。使用 [INSERT INTO SELECT 配合 S3 
TVF](../data-operate/import/streaming-job/streaming-job-tvf) 实现增量导入。
+- **Random 分桶的 Duplicate Key 表启用 `load_to_single_tablet`**，减少写放大。
+
+详见[导入最佳实践](../data-operate/import/load-best-practices)。
+
+### 查询
+
+- **避免数据倾斜。**通过 `SHOW TABLETS` 检查 tablet 大小。差异明显时切换为 Random 分桶或选择基数更高的分桶列。
+- **不要分桶过多。**过多的小 tablet 会产生调度开销，查询性能最多可下降 50%。参见[分桶](#分桶)了解分桶数选择。
+- **不要分桶过少。**过少的 tablet 会限制 CPU 并行度。参见[分桶](#分桶)了解分桶数选择。
+- **正确设置排序键。**与 PostgreSQL 等系统不同，Doris 仅对排序键的前 36 字节建立索引，且遇到 VARCHAR 
会立即截断。超出前缀范围的列无法从排序键受益，需添加[倒排索引](../table-design/index/inverted-index/overview)。参见
 [Sort Key（排序键）](#sort-key排序键)。
+
+诊断慢查询请使用 [Query Profile](../query-acceleration/query-profile)。
diff --git 
a/versioned_docs/version-3.x/gettingStarted/before-you-start-the-poc.md 
b/versioned_docs/version-3.x/gettingStarted/before-you-start-the-poc.md
new file mode 100644
index 00000000000..a3c040b183c
--- /dev/null
+++ b/versioned_docs/version-3.x/gettingStarted/before-you-start-the-poc.md
@@ -0,0 +1,115 @@
+---
+{
+    "title": "Before You Start the POC",
+    "language": "en",
+    "description": "Common issues new users encounter with table design, data 
loading, and query tuning in Apache Doris.",
+    "sidebar_label": "Before You Start the POC"
+}
+---
+
+# Before You Start the POC
+
+This document highlights common issues that new users may encounter, with the 
goal of accelerating the POC process.
+
+## Table Design
+
+Creating a table in Doris involves four decisions that affect load and query 
performance.
+
+### Data Model
+
+| If your data is... | Use | Why |
+|---|---|---|
+| Append-only (logs, events, facts) | **Duplicate Key** (default) | Keeps all 
rows. Best query performance. |
+| Updated by primary key (CDC, upsert) | **Unique Key** | New rows replace old 
rows with the same key. |
+| Pre-aggregated metrics (PV, UV, sums) | **Aggregate Key** | Rows are merged 
with SUM/MAX/MIN at write time. |
+
+**Duplicate Key works for most scenarios.** See [Data Model 
Overview](../table-design/data-model/overview).
+
+### Sort Key
+
+Put the column you filter on most frequently first, with fixed-size types 
(INT, BIGINT, DATE) before VARCHAR. Doris builds a [prefix 
index](../table-design/index/prefix-index) on the first 36 bytes of key columns 
but stops at the first VARCHAR. Add [inverted 
indexes](../table-design/index/inverted-index) for other columns that need fast 
filtering.
+
+### Partitioning
+
+If you have a time column, use `AUTO PARTITION BY RANGE(date_trunc(time_col, 
'day'))` to enable [partition 
pruning](../table-design/data-partitioning/auto-partitioning). Doris skips 
irrelevant partitions automatically.
+
+### Bucketing
+
+Default is **Random bucketing** (recommended for Duplicate Key tables). Use 
`DISTRIBUTED BY HASH(col)` if you frequently filter or join on a specific 
column. See [Data Bucketing](../table-design/data-partitioning/data-bucketing).
+
+**How to choose bucket count:**
+
+1. **Multiple of BE count** to ensure even data distribution. When BEs are 
added later, queries typically scan multiple partitions, so performance holds 
up.
+2. **As low as possible** to avoid small files.
+3. **Compressed data per bucket ≤ 20 GB** (≤ 10 GB for Unique Key). Check with 
`SHOW TABLETS FROM your_table`.
+4. **No more than 128 per partition.** Consider partitioning first if you need 
more. In extreme cases the upper bound is 1024, but this is rarely needed in 
production.
+
+### Example Templates
+
+#### Log / Event Analytics
+
+```sql
+CREATE TABLE app_logs
+(
+    log_time      DATETIME    NOT NULL,
+    log_level     VARCHAR(10),
+    service_name  VARCHAR(50),
+    trace_id      VARCHAR(64),
+    message       STRING,
+    INDEX idx_message (message) USING INVERTED PROPERTIES("parser" = "unicode")
+)
+AUTO PARTITION BY RANGE(date_trunc(`log_time`, 'day'))
+()
+DISTRIBUTED BY RANDOM BUCKETS 10;
+```
+
+#### Real-Time Dashboard with Upsert (CDC)
+
+```sql
+CREATE TABLE user_profiles
+(
+    user_id       BIGINT      NOT NULL,
+    username      VARCHAR(50),
+    email         VARCHAR(100),
+    status        TINYINT,
+    updated_at    DATETIME
+)
+UNIQUE KEY(user_id)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10;
+```
+
+#### Metrics Aggregation
+
+```sql
+CREATE TABLE site_metrics
+(
+    dt            DATE        NOT NULL,
+    site_id       INT         NOT NULL,
+    pv            BIGINT      SUM DEFAULT '0',
+    uv            BIGINT      MAX DEFAULT '0'
+)
+AGGREGATE KEY(dt, site_id)
+AUTO PARTITION BY RANGE(date_trunc(`dt`, 'day'))
+()
+DISTRIBUTED BY HASH(site_id) BUCKETS 10;
+```
+
+## Performance Pitfalls
+
+### Load
+
+- **Don't use `INSERT INTO VALUES` for bulk data.** Use [Stream 
Load](../data-operate/import/import-way/stream-load-manual) or [Broker 
Load](../data-operate/import/import-way/broker-load-manual) instead. See 
[Loading Overview](../data-operate/import/load-manual).
+- **Batch writes on the client side.** High-frequency small imports cause 
version accumulation. If not feasible, use [Group 
Commit](../data-operate/import/group-commit-manual).
+- **Break large imports into smaller batches.** A failed long-running import 
must restart from scratch. Use INSERT INTO SELECT with S3 TVF for incremental 
import.
+- **Enable `load_to_single_tablet`** for Duplicate Key tables with Random 
bucketing to reduce write amplification.
+
+See [Load Best Practices](../data-operate/import/load-best-practices).
+
+### Query
+
+- **Avoid data skew.** Check tablet sizes with `SHOW TABLETS`. Switch to 
Random bucketing or a higher-cardinality bucket column if sizes vary 
significantly.
+- **Don't over-bucket.** Too many small tablets create scheduling overhead and 
can degrade query performance by up to 50%. See [Bucketing](#bucketing) for 
sizing guidelines.
+- **Don't under-bucket.** Too few tablets limit CPU parallelism. See 
[Bucketing](#bucketing) for sizing guidelines.
+- **Put the right columns in the sort key.** Unlike systems such as 
PostgreSQL, Doris only indexes the first 36 bytes of key columns and stops at 
the first VARCHAR. Columns beyond this prefix won't benefit from the sort key. 
Add [inverted indexes](../table-design/index/inverted-index) for those columns. 
See [Sort Key](#sort-key).
+
+Use [Query Profile](../admin-manual/open-api/fe-http/query-profile-action) to 
diagnose slow queries.
diff --git 
a/versioned_docs/version-4.x/gettingStarted/before-you-start-the-poc.md 
b/versioned_docs/version-4.x/gettingStarted/before-you-start-the-poc.md
new file mode 100644
index 00000000000..c507a505517
--- /dev/null
+++ b/versioned_docs/version-4.x/gettingStarted/before-you-start-the-poc.md
@@ -0,0 +1,115 @@
+---
+{
+    "title": "Before You Start the POC",
+    "language": "en",
+    "description": "Common issues new users encounter with table design, data 
loading, and query tuning in Apache Doris.",
+    "sidebar_label": "Before You Start the POC"
+}
+---
+
+# Before You Start the POC
+
+This document highlights common issues that new users may encounter, with the 
goal of accelerating the POC process.
+
+## Table Design
+
+Creating a table in Doris involves four decisions that affect load and query 
performance.
+
+### Data Model
+
+| If your data is... | Use | Why |
+|---|---|---|
+| Append-only (logs, events, facts) | **Duplicate Key** (default) | Keeps all 
rows. Best query performance. |
+| Updated by primary key (CDC, upsert) | **Unique Key** | New rows replace old 
rows with the same key. |
+| Pre-aggregated metrics (PV, UV, sums) | **Aggregate Key** | Rows are merged 
with SUM/MAX/MIN at write time. |
+
+**Duplicate Key works for most scenarios.** See [Data Model 
Overview](../table-design/data-model/overview).
+
+### Sort Key
+
+Put the column you filter on most frequently first, with fixed-size types 
(INT, BIGINT, DATE) before VARCHAR. Doris builds a [prefix 
index](../table-design/index/prefix-index) on the first 36 bytes of key columns 
but stops at the first VARCHAR. Add [inverted 
indexes](../table-design/index/inverted-index/overview) for other columns that 
need fast filtering.
+
+### Partitioning
+
+If you have a time column, use `AUTO PARTITION BY RANGE(date_trunc(time_col, 
'day'))` to enable [partition 
pruning](../table-design/data-partitioning/auto-partitioning). Doris skips 
irrelevant partitions automatically.
+
+### Bucketing
+
+Default is **Random bucketing** (recommended for Duplicate Key tables). Use 
`DISTRIBUTED BY HASH(col)` if you frequently filter or join on a specific 
column. See [Data Bucketing](../table-design/data-partitioning/data-bucketing).
+
+**How to choose bucket count:**
+
+1. **Multiple of BE count** to ensure even data distribution. When BEs are 
added later, queries typically scan multiple partitions, so performance holds 
up.
+2. **As low as possible** to avoid small files.
+3. **Compressed data per bucket ≤ 20 GB** (≤ 10 GB for Unique Key). Check with 
`SHOW TABLETS FROM your_table`.
+4. **No more than 128 per partition.** Consider partitioning first if you need 
more. In extreme cases the upper bound is 1024, but this is rarely needed in 
production.
+
+### Example Templates
+
+#### Log / Event Analytics
+
+```sql
+CREATE TABLE app_logs
+(
+    log_time      DATETIME    NOT NULL,
+    log_level     VARCHAR(10),
+    service_name  VARCHAR(50),
+    trace_id      VARCHAR(64),
+    message       STRING,
+    INDEX idx_message (message) USING INVERTED PROPERTIES("parser" = "unicode")
+)
+AUTO PARTITION BY RANGE(date_trunc(`log_time`, 'day'))
+()
+DISTRIBUTED BY RANDOM BUCKETS 10;
+```
+
+#### Real-Time Dashboard with Upsert (CDC)
+
+```sql
+CREATE TABLE user_profiles
+(
+    user_id       BIGINT      NOT NULL,
+    username      VARCHAR(50),
+    email         VARCHAR(100),
+    status        TINYINT,
+    updated_at    DATETIME
+)
+UNIQUE KEY(user_id)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10;
+```
+
+#### Metrics Aggregation
+
+```sql
+CREATE TABLE site_metrics
+(
+    dt            DATE        NOT NULL,
+    site_id       INT         NOT NULL,
+    pv            BIGINT      SUM DEFAULT '0',
+    uv            BIGINT      MAX DEFAULT '0'
+)
+AGGREGATE KEY(dt, site_id)
+AUTO PARTITION BY RANGE(date_trunc(`dt`, 'day'))
+()
+DISTRIBUTED BY HASH(site_id) BUCKETS 10;
+```
+
+## Performance Pitfalls
+
+### Load
+
+- **Don't use `INSERT INTO VALUES` for bulk data.** Use [Stream 
Load](../data-operate/import/import-way/stream-load-manual) or [Broker 
Load](../data-operate/import/import-way/broker-load-manual) instead. See 
[Loading Overview](../data-operate/import/load-manual).
+- **Batch writes on the client side.** High-frequency small imports cause 
version accumulation. If not feasible, use [Group 
Commit](../data-operate/import/group-commit-manual).
+- **Break large imports into smaller batches.** A failed long-running import 
must restart from scratch. Use [INSERT INTO SELECT with S3 
TVF](../data-operate/import/streaming-job/streaming-job-tvf) for incremental 
import.
+- **Enable `load_to_single_tablet`** for Duplicate Key tables with Random 
bucketing to reduce write amplification.
+
+See [Load Best Practices](../data-operate/import/load-best-practices).
+
+### Query
+
+- **Avoid data skew.** Check tablet sizes with `SHOW TABLETS`. Switch to 
Random bucketing or a higher-cardinality bucket column if sizes vary 
significantly.
+- **Don't over-bucket.** Too many small tablets create scheduling overhead and 
can degrade query performance by up to 50%. See [Bucketing](#bucketing) for 
sizing guidelines.
+- **Don't under-bucket.** Too few tablets limit CPU parallelism. See 
[Bucketing](#bucketing) for sizing guidelines.
+- **Put the right columns in the sort key.** Unlike systems such as 
PostgreSQL, Doris only indexes the first 36 bytes of key columns and stops at 
the first VARCHAR. Columns beyond this prefix won't benefit from the sort key. 
Add [inverted indexes](../table-design/index/inverted-index/overview) for those 
columns. See [Sort Key](#sort-key).
+
+See [Query Profile](../query-acceleration/query-profile) to diagnose slow 
queries.
diff --git a/versioned_sidebars/version-3.x-sidebars.json 
b/versioned_sidebars/version-3.x-sidebars.json
index 12bd9268eed..d95c28292c3 100644
--- a/versioned_sidebars/version-3.x-sidebars.json
+++ b/versioned_sidebars/version-3.x-sidebars.json
@@ -7,6 +7,7 @@
             "items": [
                 "gettingStarted/what-is-apache-doris",
                 "gettingStarted/quick-start",
+                "gettingStarted/before-you-start-the-poc",
                 {
                     "type": "category",
                     "label": "Tech Alternatives",
diff --git a/versioned_sidebars/version-4.x-sidebars.json 
b/versioned_sidebars/version-4.x-sidebars.json
index cb4bd320d6c..9196edfb39b 100644
--- a/versioned_sidebars/version-4.x-sidebars.json
+++ b/versioned_sidebars/version-4.x-sidebars.json
@@ -7,6 +7,7 @@
             "items": [
                 "gettingStarted/what-is-apache-doris",
                 "gettingStarted/quick-start",
+                "gettingStarted/before-you-start-the-poc",
                 {
                     "type": "category",
                     "label": "Tech Alternatives",


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [doc] Add Before You Start the POC guide to 3.x and 4.x (#3470)

Reply via email to