Copilot commented on code in PR #3294:
URL: https://github.com/apache/doris-website/pull/3294#discussion_r2702944881
##########
versioned_docs/version-4.x/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -106,33 +106,54 @@ Apache Doris and ClickHouse are both leading global
real-time data warehouses, s
</td>
</tr>
<tr>
- <td><strong>Queries</strong></td>
+ <td><strong>Join Query Performance</strong></td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Distributed joins</li>
- <li style={{fontSize:'14px'}}>Cost-Based Optimization (CBO)</li>
- <li style={{fontSize:'14px'}}>Query rewrites and multi-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Higher concurrent performance</li>
+ <li style={{fontSize:'14px'}}><strong>2-10x faster joins</strong>
with true distributed join execution across nodes</li>
+ <li style={{fontSize:'14px'}}>Advanced Cost-Based Optimizer (CBO)
automatically selects optimal join strategies (broadcast, shuffle,
colocate)</li>
+ <li style={{fontSize:'14px'}}>Colocate Join eliminates network
shuffle for pre-partitioned tables</li>
+ <li style={{fontSize:'14px'}}>Runtime Filter pushdown reduces data
scanning by up to 90%</li>
+ <li style={{fontSize:'14px'}}>Transparent query acceleration -
queries on base tables are automatically rewritten to use materialized
views</li>
+ <li style={{fontSize:'14px'}}>Handles complex TPC-DS queries that
cause OOM in ClickHouse</li>
</ul>
</td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Poor join implementation</li>
- <li style={{fontSize:'14px'}}>Lacks a Cost-Based Optimizer
(CBO)</li>
- <li style={{fontSize:'14px'}}>Only supports single-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Lower concurrency performance</li>
+ <li style={{fontSize:'14px'}}>Limited join capability - relies on
subqueries and denormalization</li>
+ <li style={{fontSize:'14px'}}>No Cost-Based Optimizer; requires
manual query tuning</li>
+ <li style={{fontSize:'14px'}}>Scatter-Gather architecture not
designed for distributed joins</li>
+ <li style={{fontSize:'14px'}}>~50% of TPC-DS queries fail due to
unsupported correlated subqueries</li>
Review Comment:
The claim "~50% of TPC-DS queries fail due to unsupported correlated
subqueries" is a specific statistical claim that should be supported by
benchmarks or citations. Without evidence, this could be considered
unsubstantiated. Consider adding a reference or softening the claim with
language like "many TPC-DS queries" instead of a specific percentage.
```suggestion
<li style={{fontSize:'14px'}}>Many TPC-DS queries fail due to
unsupported correlated subqueries</li>
```
##########
docs/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -106,33 +106,54 @@ Apache Doris and ClickHouse are both leading global
real-time data warehouses, s
</td>
</tr>
<tr>
- <td><strong>Queries</strong></td>
+ <td><strong>Join Query Performance</strong></td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Distributed joins</li>
- <li style={{fontSize:'14px'}}>Cost-Based Optimization (CBO)</li>
- <li style={{fontSize:'14px'}}>Query rewrites and multi-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Higher concurrent performance</li>
+ <li style={{fontSize:'14px'}}><strong>2-10x faster joins</strong>
with true distributed join execution across nodes</li>
+ <li style={{fontSize:'14px'}}>Advanced Cost-Based Optimizer (CBO)
automatically selects optimal join strategies (broadcast, shuffle,
colocate)</li>
+ <li style={{fontSize:'14px'}}>Colocate Join eliminates network
shuffle for pre-partitioned tables</li>
+ <li style={{fontSize:'14px'}}>Runtime Filter pushdown reduces data
scanning by up to 90%</li>
+ <li style={{fontSize:'14px'}}>Transparent query acceleration -
queries on base tables are automatically rewritten to use materialized
views</li>
+ <li style={{fontSize:'14px'}}>Handles complex TPC-DS queries that
cause OOM in ClickHouse</li>
</ul>
</td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Poor join implementation</li>
- <li style={{fontSize:'14px'}}>Lacks a Cost-Based Optimizer
(CBO)</li>
- <li style={{fontSize:'14px'}}>Only supports single-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Lower concurrency performance</li>
+ <li style={{fontSize:'14px'}}>Limited join capability - relies on
subqueries and denormalization</li>
+ <li style={{fontSize:'14px'}}>No Cost-Based Optimizer; requires
manual query tuning</li>
+ <li style={{fontSize:'14px'}}>Scatter-Gather architecture not
designed for distributed joins</li>
+ <li style={{fontSize:'14px'}}>~50% of TPC-DS queries fail due to
unsupported correlated subqueries</li>
+ <li style={{fontSize:'14px'}}>No automatic query rewriting - must
explicitly query materialized views; cannot accelerate queries on base
tables</li>
Review Comment:
The claim that ClickHouse "Only supports single-table materialized views" in
the original version was incorrect, and the new claim "No automatic query
rewriting - must explicitly query materialized views; cannot accelerate queries
on base tables" needs verification. ClickHouse does support multi-table
materialized views (introduced in version 21.8+), though their query rewriting
capabilities differ from Doris. Consider revising to be more precise about the
actual limitations rather than stating ClickHouse lacks multi-table
materialized views entirely.
```suggestion
<li style={{fontSize:'14px'}}>No general-purpose transparent
query rewriting to materialized views; queries typically must target
materialized views explicitly, unlike Doris's automatic MV-based acceleration
on base tables</li>
```
##########
docs/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -106,33 +106,54 @@ Apache Doris and ClickHouse are both leading global
real-time data warehouses, s
</td>
</tr>
<tr>
- <td><strong>Queries</strong></td>
+ <td><strong>Join Query Performance</strong></td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Distributed joins</li>
- <li style={{fontSize:'14px'}}>Cost-Based Optimization (CBO)</li>
- <li style={{fontSize:'14px'}}>Query rewrites and multi-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Higher concurrent performance</li>
+ <li style={{fontSize:'14px'}}><strong>2-10x faster joins</strong>
with true distributed join execution across nodes</li>
+ <li style={{fontSize:'14px'}}>Advanced Cost-Based Optimizer (CBO)
automatically selects optimal join strategies (broadcast, shuffle,
colocate)</li>
+ <li style={{fontSize:'14px'}}>Colocate Join eliminates network
shuffle for pre-partitioned tables</li>
+ <li style={{fontSize:'14px'}}>Runtime Filter pushdown reduces data
scanning by up to 90%</li>
+ <li style={{fontSize:'14px'}}>Transparent query acceleration -
queries on base tables are automatically rewritten to use materialized
views</li>
+ <li style={{fontSize:'14px'}}>Handles complex TPC-DS queries that
cause OOM in ClickHouse</li>
</ul>
</td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Poor join implementation</li>
- <li style={{fontSize:'14px'}}>Lacks a Cost-Based Optimizer
(CBO)</li>
- <li style={{fontSize:'14px'}}>Only supports single-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Lower concurrency performance</li>
+ <li style={{fontSize:'14px'}}>Limited join capability - relies on
subqueries and denormalization</li>
+ <li style={{fontSize:'14px'}}>No Cost-Based Optimizer; requires
manual query tuning</li>
+ <li style={{fontSize:'14px'}}>Scatter-Gather architecture not
designed for distributed joins</li>
+ <li style={{fontSize:'14px'}}>~50% of TPC-DS queries fail due to
unsupported correlated subqueries</li>
+ <li style={{fontSize:'14px'}}>No automatic query rewriting - must
explicitly query materialized views; cannot accelerate queries on base
tables</li>
+ <li style={{fontSize:'14px'}}>Frequent OOM errors on large
multi-table queries</li>
</ul>
</td>
</tr>
<tr>
<td><strong>Real-time Updates</strong></td>
<td><ul>
- <li style={{fontSize:'14px'}}>Features a strongly consistent
primary key storage model, supporting synchronous data updates and
deletions</li>
- <li style={{fontSize:'14px'}}>Implements Merge-On-Write for unique
keys with a delete bitmap, ensuring query performance remains unaffected.</li>
+ <li style={{fontSize:'14px'}}><strong>34x faster query
performance</strong> than ClickHouse for real-time update workloads</li>
+ <li style={{fontSize:'14px'}}>Merge-on-Write (MoW) engine with
delete bitmap ensures query performance remains constant regardless of update
frequency</li>
+ <li style={{fontSize:'14px'}}>Strongly consistent primary key
model - updates are immediately visible with no stale reads</li>
+ <li style={{fontSize:'14px'}}>Supports high-throughput UPSERT
operations without query performance degradation</li>
+ <li style={{fontSize:'14px'}}>Partial column updates minimize
write amplification</li>
</ul></td>
<td><ul>
- <li style={{fontSize:'14px'}}>Only supports asynchronous updates,
allowing old values to be read after an update.</li>
- <li style={{fontSize:'14px'}}>Synchronous updates are possible
using the <code>FINAL</code> keyword, but this significantly degrades query
performance.</li>
+ <li style={{fontSize:'14px'}}>ReplacingMergeTree only supports
eventual consistency - stale data visible until background merge</li>
+ <li style={{fontSize:'14px'}}>Using <code>FINAL</code> keyword for
consistent reads causes 2-10x query slowdown</li>
+ <li style={{fontSize:'14px'}}>High update frequency leads to
excessive merge overhead and query latency spikes</li>
+ </ul></td>
+ </tr>
+ <tr>
+ <td><strong>Query Concurrency</strong></td>
+ <td><ul>
+ <li style={{fontSize:'14px'}}><strong>10x higher
concurrency</strong> - supports thousands of concurrent queries</li>
+ <li style={{fontSize:'14px'}}>Efficient memory management prevents
OOM under high load</li>
+ <li style={{fontSize:'14px'}}>Query queue management with workload
isolation</li>
+ </ul></td>
+ <td><ul>
+ <li style={{fontSize:'14px'}}>Limited concurrent query support
(typically <100)</li>
Review Comment:
The statement "Limited concurrent query support (typically <100)" needs
substantiation. ClickHouse's concurrency limits vary significantly based on
hardware configuration, query complexity, and tuning. Without context or
citations, this appears as an unsubstantiated claim. Consider providing context
about the conditions under which this limit applies or citing benchmarks.
```suggestion
<li style={{fontSize:'14px'}}>Limited concurrent query support
out-of-the-box; scaling to very high concurrency often requires careful tuning
and robust hardware</li>
```
##########
versioned_docs/version-4.x/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -2,80 +2,80 @@
{
"title": "Alternative to ClickHouse",
"language": "en",
- "description": "Apache Doris and ClickHouse are both leading global
real-time data warehouses, supporting columnar storage and fast queries."
+ "description": "Apache Doris offers 10x faster join queries, up to 70%
lower costs via storage-compute separation, and 34x better real-time update
performance compared to ClickHouse."
}
---
-Apache Doris and ClickHouse are both leading global real-time data warehouses,
supporting columnar storage and fast queries. Doris has advantages like higher
concurrency, more efficient joins, easier maintenance, and MySQL-like SQL
syntax, making it simpler to use and deploy.
+Apache Doris and ClickHouse are both leading real-time analytical databases
with columnar storage and fast query capabilities. Apache Doris offers
significant advantages over ClickHouse in three critical areas: **10x faster
join query performance** through its advanced MPP architecture with Cost-Based
Optimizer, **lower infrastructure costs** via compute-storage separation that
allows independent scaling of resources, and **superior real-time update
performance** with its Merge-on-Write engine that maintains query speed during
high-frequency data modifications.
Review Comment:
The claim of "10x faster join query performance" in the introduction appears
inconsistent with the "2-10x faster joins" claim in line 112. The introduction
should use the same range (2-10x) as stated in the detailed comparison table
for accuracy and consistency.
##########
versioned_docs/version-4.x/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -166,15 +187,24 @@ Apache Doris and ClickHouse are both leading global
real-time data warehouses, s
</td>
</tr>
<tr>
- <td><strong>Storage-Compute Separation</strong></td>
+ <td><strong>Cost Efficiency (Storage-Compute Separation)</strong></td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Available as an open-source
feature since version 3.0</li>
+ <li style={{fontSize:'14px'}}><strong>Up to 70% cost
reduction</strong> by independently scaling compute and storage</li>
Review Comment:
The claim "Up to 70% cost reduction" is a significant cost savings claim
that should be supported by case studies or specific scenarios. Without context
about the workload patterns, infrastructure setup, or baseline comparison, this
appears as an unsubstantiated claim. Consider adding a reference to a case
study or providing more context about when this cost reduction is achievable.
##########
i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -2,81 +2,81 @@
{
"title": "Apache Doris vs ClickHouse",
"language": "zh-CN",
- "description": "Apache Doris 和 ClickHouse
都是全球领先的实时数据仓库,均支持列式存储与极速查询。此外,Doris 具备更高的并发处理能力、更高效的 Join 查询、更简便的运维,并兼容 MySQL
的语法,使其更易于使用和部署。"
+ "description": "Apache Doris 相比 ClickHouse 具备 10 倍以上的 Join 查询性能、通过存算分离降低高达
70% 的成本,以及实时更新场景下 34 倍的查询性能优势。"
}
---
-Apache Doris 和 ClickHouse 都是全球领先的实时数据仓库,均支持列式存储与极速查询。此外,Doris 具备更高的并发处理能力、更高效的
Join 查询、更简便的运维,并兼容 MySQL 的语法,使其更易于使用和部署。
+Apache Doris 和 ClickHouse 都是全球领先的实时分析型数据库,均支持列式存储与极速查询。Apache Doris
在三个关键领域具有显著优势:**Join 查询性能提升 2-10 倍**(基于先进的 MPP 架构与基于成本的查询优化器)、**通过存算分离降低高达 70%
的基础设施成本**(支持计算与存储资源独立扩缩)、以及**卓越的实时更新性能**(写时合并引擎在高频数据更新时仍能保持稳定的查询性能)。
Review Comment:
The claim of "10 倍以上的 Join 查询性能" (10x faster join query performance) in the
introduction appears inconsistent with the "Join 性能提升 2-10 倍" (2-10x faster
joins) claim in line 112. The introduction should use the same range (2-10x) as
stated in the detailed comparison table for accuracy and consistency.
##########
docs/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -106,33 +106,54 @@ Apache Doris and ClickHouse are both leading global
real-time data warehouses, s
</td>
</tr>
<tr>
- <td><strong>Queries</strong></td>
+ <td><strong>Join Query Performance</strong></td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Distributed joins</li>
- <li style={{fontSize:'14px'}}>Cost-Based Optimization (CBO)</li>
- <li style={{fontSize:'14px'}}>Query rewrites and multi-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Higher concurrent performance</li>
+ <li style={{fontSize:'14px'}}><strong>2-10x faster joins</strong>
with true distributed join execution across nodes</li>
+ <li style={{fontSize:'14px'}}>Advanced Cost-Based Optimizer (CBO)
automatically selects optimal join strategies (broadcast, shuffle,
colocate)</li>
+ <li style={{fontSize:'14px'}}>Colocate Join eliminates network
shuffle for pre-partitioned tables</li>
+ <li style={{fontSize:'14px'}}>Runtime Filter pushdown reduces data
scanning by up to 90%</li>
+ <li style={{fontSize:'14px'}}>Transparent query acceleration -
queries on base tables are automatically rewritten to use materialized
views</li>
+ <li style={{fontSize:'14px'}}>Handles complex TPC-DS queries that
cause OOM in ClickHouse</li>
</ul>
</td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Poor join implementation</li>
- <li style={{fontSize:'14px'}}>Lacks a Cost-Based Optimizer
(CBO)</li>
- <li style={{fontSize:'14px'}}>Only supports single-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Lower concurrency performance</li>
+ <li style={{fontSize:'14px'}}>Limited join capability - relies on
subqueries and denormalization</li>
+ <li style={{fontSize:'14px'}}>No Cost-Based Optimizer; requires
manual query tuning</li>
+ <li style={{fontSize:'14px'}}>Scatter-Gather architecture not
designed for distributed joins</li>
+ <li style={{fontSize:'14px'}}>~50% of TPC-DS queries fail due to
unsupported correlated subqueries</li>
+ <li style={{fontSize:'14px'}}>No automatic query rewriting - must
explicitly query materialized views; cannot accelerate queries on base
tables</li>
+ <li style={{fontSize:'14px'}}>Frequent OOM errors on large
multi-table queries</li>
</ul>
</td>
</tr>
<tr>
<td><strong>Real-time Updates</strong></td>
<td><ul>
- <li style={{fontSize:'14px'}}>Features a strongly consistent
primary key storage model, supporting synchronous data updates and
deletions</li>
- <li style={{fontSize:'14px'}}>Implements Merge-On-Write for unique
keys with a delete bitmap, ensuring query performance remains unaffected.</li>
+ <li style={{fontSize:'14px'}}><strong>34x faster query
performance</strong> than ClickHouse for real-time update workloads</li>
+ <li style={{fontSize:'14px'}}>Merge-on-Write (MoW) engine with
delete bitmap ensures query performance remains constant regardless of update
frequency</li>
+ <li style={{fontSize:'14px'}}>Strongly consistent primary key
model - updates are immediately visible with no stale reads</li>
+ <li style={{fontSize:'14px'}}>Supports high-throughput UPSERT
operations without query performance degradation</li>
+ <li style={{fontSize:'14px'}}>Partial column updates minimize
write amplification</li>
</ul></td>
<td><ul>
- <li style={{fontSize:'14px'}}>Only supports asynchronous updates,
allowing old values to be read after an update.</li>
- <li style={{fontSize:'14px'}}>Synchronous updates are possible
using the <code>FINAL</code> keyword, but this significantly degrades query
performance.</li>
+ <li style={{fontSize:'14px'}}>ReplacingMergeTree only supports
eventual consistency - stale data visible until background merge</li>
+ <li style={{fontSize:'14px'}}>Using <code>FINAL</code> keyword for
consistent reads causes 2-10x query slowdown</li>
+ <li style={{fontSize:'14px'}}>High update frequency leads to
excessive merge overhead and query latency spikes</li>
+ </ul></td>
+ </tr>
+ <tr>
+ <td><strong>Query Concurrency</strong></td>
+ <td><ul>
+ <li style={{fontSize:'14px'}}><strong>10x higher
concurrency</strong> - supports thousands of concurrent queries</li>
Review Comment:
The claim "10x higher concurrency" in line 149 appears inconsistent with the
earlier claim in line 9 that mentions "10x faster join query performance". The
introduction should clarify that the 10x improvement refers to concurrency
capability, not just join performance, or adjust the wording for clarity.
##########
versioned_docs/version-4.x/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -106,33 +106,54 @@ Apache Doris and ClickHouse are both leading global
real-time data warehouses, s
</td>
</tr>
<tr>
- <td><strong>Queries</strong></td>
+ <td><strong>Join Query Performance</strong></td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Distributed joins</li>
- <li style={{fontSize:'14px'}}>Cost-Based Optimization (CBO)</li>
- <li style={{fontSize:'14px'}}>Query rewrites and multi-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Higher concurrent performance</li>
+ <li style={{fontSize:'14px'}}><strong>2-10x faster joins</strong>
with true distributed join execution across nodes</li>
+ <li style={{fontSize:'14px'}}>Advanced Cost-Based Optimizer (CBO)
automatically selects optimal join strategies (broadcast, shuffle,
colocate)</li>
+ <li style={{fontSize:'14px'}}>Colocate Join eliminates network
shuffle for pre-partitioned tables</li>
+ <li style={{fontSize:'14px'}}>Runtime Filter pushdown reduces data
scanning by up to 90%</li>
+ <li style={{fontSize:'14px'}}>Transparent query acceleration -
queries on base tables are automatically rewritten to use materialized
views</li>
+ <li style={{fontSize:'14px'}}>Handles complex TPC-DS queries that
cause OOM in ClickHouse</li>
</ul>
</td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Poor join implementation</li>
- <li style={{fontSize:'14px'}}>Lacks a Cost-Based Optimizer
(CBO)</li>
- <li style={{fontSize:'14px'}}>Only supports single-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Lower concurrency performance</li>
+ <li style={{fontSize:'14px'}}>Limited join capability - relies on
subqueries and denormalization</li>
+ <li style={{fontSize:'14px'}}>No Cost-Based Optimizer; requires
manual query tuning</li>
+ <li style={{fontSize:'14px'}}>Scatter-Gather architecture not
designed for distributed joins</li>
+ <li style={{fontSize:'14px'}}>~50% of TPC-DS queries fail due to
unsupported correlated subqueries</li>
+ <li style={{fontSize:'14px'}}>No automatic query rewriting - must
explicitly query materialized views; cannot accelerate queries on base
tables</li>
+ <li style={{fontSize:'14px'}}>Frequent OOM errors on large
multi-table queries</li>
</ul>
</td>
</tr>
<tr>
<td><strong>Real-time Updates</strong></td>
<td><ul>
- <li style={{fontSize:'14px'}}>Features a strongly consistent
primary key storage model, supporting synchronous data updates and
deletions</li>
- <li style={{fontSize:'14px'}}>Implements Merge-On-Write for unique
keys with a delete bitmap, ensuring query performance remains unaffected.</li>
+ <li style={{fontSize:'14px'}}><strong>34x faster query
performance</strong> than ClickHouse for real-time update workloads</li>
+ <li style={{fontSize:'14px'}}>Merge-on-Write (MoW) engine with
delete bitmap ensures query performance remains constant regardless of update
frequency</li>
+ <li style={{fontSize:'14px'}}>Strongly consistent primary key
model - updates are immediately visible with no stale reads</li>
+ <li style={{fontSize:'14px'}}>Supports high-throughput UPSERT
operations without query performance degradation</li>
+ <li style={{fontSize:'14px'}}>Partial column updates minimize
write amplification</li>
+ </ul></td>
+ <td><ul>
+ <li style={{fontSize:'14px'}}>ReplacingMergeTree only supports
eventual consistency - stale data visible until background merge</li>
+ <li style={{fontSize:'14px'}}>Using <code>FINAL</code> keyword for
consistent reads causes 2-10x query slowdown</li>
+ <li style={{fontSize:'14px'}}>High update frequency leads to
excessive merge overhead and query latency spikes</li>
+ </ul></td>
+ </tr>
+ <tr>
+ <td><strong>Query Concurrency</strong></td>
+ <td><ul>
+ <li style={{fontSize:'14px'}}><strong>10x higher
concurrency</strong> - supports thousands of concurrent queries</li>
Review Comment:
The claim "10x higher concurrency" in line 149 appears inconsistent with the
earlier claim in line 9 that mentions "10x faster join query performance". The
introduction should clarify that the 10x improvement refers to concurrency
capability, not just join performance, or adjust the wording for clarity.
##########
versioned_docs/version-4.x/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -106,33 +106,54 @@ Apache Doris and ClickHouse are both leading global
real-time data warehouses, s
</td>
</tr>
<tr>
- <td><strong>Queries</strong></td>
+ <td><strong>Join Query Performance</strong></td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Distributed joins</li>
- <li style={{fontSize:'14px'}}>Cost-Based Optimization (CBO)</li>
- <li style={{fontSize:'14px'}}>Query rewrites and multi-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Higher concurrent performance</li>
+ <li style={{fontSize:'14px'}}><strong>2-10x faster joins</strong>
with true distributed join execution across nodes</li>
+ <li style={{fontSize:'14px'}}>Advanced Cost-Based Optimizer (CBO)
automatically selects optimal join strategies (broadcast, shuffle,
colocate)</li>
+ <li style={{fontSize:'14px'}}>Colocate Join eliminates network
shuffle for pre-partitioned tables</li>
+ <li style={{fontSize:'14px'}}>Runtime Filter pushdown reduces data
scanning by up to 90%</li>
+ <li style={{fontSize:'14px'}}>Transparent query acceleration -
queries on base tables are automatically rewritten to use materialized
views</li>
+ <li style={{fontSize:'14px'}}>Handles complex TPC-DS queries that
cause OOM in ClickHouse</li>
</ul>
</td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Poor join implementation</li>
- <li style={{fontSize:'14px'}}>Lacks a Cost-Based Optimizer
(CBO)</li>
- <li style={{fontSize:'14px'}}>Only supports single-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Lower concurrency performance</li>
+ <li style={{fontSize:'14px'}}>Limited join capability - relies on
subqueries and denormalization</li>
+ <li style={{fontSize:'14px'}}>No Cost-Based Optimizer; requires
manual query tuning</li>
+ <li style={{fontSize:'14px'}}>Scatter-Gather architecture not
designed for distributed joins</li>
+ <li style={{fontSize:'14px'}}>~50% of TPC-DS queries fail due to
unsupported correlated subqueries</li>
+ <li style={{fontSize:'14px'}}>No automatic query rewriting - must
explicitly query materialized views; cannot accelerate queries on base
tables</li>
Review Comment:
The claim that ClickHouse "Only supports single-table materialized views" in
the original version was incorrect, and the new claim "No automatic query
rewriting - must explicitly query materialized views; cannot accelerate queries
on base tables" needs verification. ClickHouse does support multi-table
materialized views (introduced in version 21.8+), though their query rewriting
capabilities differ from Doris. Consider revising to be more precise about the
actual limitations rather than stating ClickHouse lacks multi-table
materialized views entirely.
```suggestion
<li style={{fontSize:'14px'}}>Limited automatic query rewriting
compared to Doris – transparent acceleration is restricted (especially for
multi-table/join materialized views) and often requires explicitly using
materialized views or projections</li>
```
##########
versioned_docs/version-4.x/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -106,33 +106,54 @@ Apache Doris and ClickHouse are both leading global
real-time data warehouses, s
</td>
</tr>
<tr>
- <td><strong>Queries</strong></td>
+ <td><strong>Join Query Performance</strong></td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Distributed joins</li>
- <li style={{fontSize:'14px'}}>Cost-Based Optimization (CBO)</li>
- <li style={{fontSize:'14px'}}>Query rewrites and multi-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Higher concurrent performance</li>
+ <li style={{fontSize:'14px'}}><strong>2-10x faster joins</strong>
with true distributed join execution across nodes</li>
+ <li style={{fontSize:'14px'}}>Advanced Cost-Based Optimizer (CBO)
automatically selects optimal join strategies (broadcast, shuffle,
colocate)</li>
+ <li style={{fontSize:'14px'}}>Colocate Join eliminates network
shuffle for pre-partitioned tables</li>
+ <li style={{fontSize:'14px'}}>Runtime Filter pushdown reduces data
scanning by up to 90%</li>
+ <li style={{fontSize:'14px'}}>Transparent query acceleration -
queries on base tables are automatically rewritten to use materialized
views</li>
+ <li style={{fontSize:'14px'}}>Handles complex TPC-DS queries that
cause OOM in ClickHouse</li>
</ul>
</td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Poor join implementation</li>
- <li style={{fontSize:'14px'}}>Lacks a Cost-Based Optimizer
(CBO)</li>
- <li style={{fontSize:'14px'}}>Only supports single-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Lower concurrency performance</li>
+ <li style={{fontSize:'14px'}}>Limited join capability - relies on
subqueries and denormalization</li>
+ <li style={{fontSize:'14px'}}>No Cost-Based Optimizer; requires
manual query tuning</li>
+ <li style={{fontSize:'14px'}}>Scatter-Gather architecture not
designed for distributed joins</li>
+ <li style={{fontSize:'14px'}}>~50% of TPC-DS queries fail due to
unsupported correlated subqueries</li>
+ <li style={{fontSize:'14px'}}>No automatic query rewriting - must
explicitly query materialized views; cannot accelerate queries on base
tables</li>
+ <li style={{fontSize:'14px'}}>Frequent OOM errors on large
multi-table queries</li>
</ul>
</td>
</tr>
<tr>
<td><strong>Real-time Updates</strong></td>
<td><ul>
- <li style={{fontSize:'14px'}}>Features a strongly consistent
primary key storage model, supporting synchronous data updates and
deletions</li>
- <li style={{fontSize:'14px'}}>Implements Merge-On-Write for unique
keys with a delete bitmap, ensuring query performance remains unaffected.</li>
+ <li style={{fontSize:'14px'}}><strong>34x faster query
performance</strong> than ClickHouse for real-time update workloads</li>
+ <li style={{fontSize:'14px'}}>Merge-on-Write (MoW) engine with
delete bitmap ensures query performance remains constant regardless of update
frequency</li>
+ <li style={{fontSize:'14px'}}>Strongly consistent primary key
model - updates are immediately visible with no stale reads</li>
+ <li style={{fontSize:'14px'}}>Supports high-throughput UPSERT
operations without query performance degradation</li>
+ <li style={{fontSize:'14px'}}>Partial column updates minimize
write amplification</li>
+ </ul></td>
+ <td><ul>
+ <li style={{fontSize:'14px'}}>ReplacingMergeTree only supports
eventual consistency - stale data visible until background merge</li>
+ <li style={{fontSize:'14px'}}>Using <code>FINAL</code> keyword for
consistent reads causes 2-10x query slowdown</li>
+ <li style={{fontSize:'14px'}}>High update frequency leads to
excessive merge overhead and query latency spikes</li>
+ </ul></td>
+ </tr>
+ <tr>
+ <td><strong>Query Concurrency</strong></td>
+ <td><ul>
+ <li style={{fontSize:'14px'}}><strong>10x higher
concurrency</strong> - supports thousands of concurrent queries</li>
+ <li style={{fontSize:'14px'}}>Efficient memory management prevents
OOM under high load</li>
+ <li style={{fontSize:'14px'}}>Query queue management with workload
isolation</li>
</ul></td>
<td><ul>
- <li style={{fontSize:'14px'}}>Only supports asynchronous updates,
allowing old values to be read after an update.</li>
- <li style={{fontSize:'14px'}}>Synchronous updates are possible
using the <code>FINAL</code> keyword, but this significantly degrades query
performance.</li>
+ <li style={{fontSize:'14px'}}>Limited concurrent query support
(typically <100)</li>
Review Comment:
The statement "Limited concurrent query support (typically <100)" needs
substantiation. ClickHouse's concurrency limits vary significantly based on
hardware configuration, query complexity, and tuning. Without context or
citations, this appears as an unsubstantiated claim. Consider providing context
about the conditions under which this limit applies or citing benchmarks.
```suggestion
<li style={{fontSize:'14px'}}>Can require careful tuning to
maintain stability at very high concurrent query counts</li>
```
##########
docs/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -106,33 +106,54 @@ Apache Doris and ClickHouse are both leading global
real-time data warehouses, s
</td>
</tr>
<tr>
- <td><strong>Queries</strong></td>
+ <td><strong>Join Query Performance</strong></td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Distributed joins</li>
- <li style={{fontSize:'14px'}}>Cost-Based Optimization (CBO)</li>
- <li style={{fontSize:'14px'}}>Query rewrites and multi-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Higher concurrent performance</li>
+ <li style={{fontSize:'14px'}}><strong>2-10x faster joins</strong>
with true distributed join execution across nodes</li>
+ <li style={{fontSize:'14px'}}>Advanced Cost-Based Optimizer (CBO)
automatically selects optimal join strategies (broadcast, shuffle,
colocate)</li>
+ <li style={{fontSize:'14px'}}>Colocate Join eliminates network
shuffle for pre-partitioned tables</li>
+ <li style={{fontSize:'14px'}}>Runtime Filter pushdown reduces data
scanning by up to 90%</li>
+ <li style={{fontSize:'14px'}}>Transparent query acceleration -
queries on base tables are automatically rewritten to use materialized
views</li>
+ <li style={{fontSize:'14px'}}>Handles complex TPC-DS queries that
cause OOM in ClickHouse</li>
</ul>
</td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Poor join implementation</li>
- <li style={{fontSize:'14px'}}>Lacks a Cost-Based Optimizer
(CBO)</li>
- <li style={{fontSize:'14px'}}>Only supports single-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Lower concurrency performance</li>
+ <li style={{fontSize:'14px'}}>Limited join capability - relies on
subqueries and denormalization</li>
+ <li style={{fontSize:'14px'}}>No Cost-Based Optimizer; requires
manual query tuning</li>
+ <li style={{fontSize:'14px'}}>Scatter-Gather architecture not
designed for distributed joins</li>
+ <li style={{fontSize:'14px'}}>~50% of TPC-DS queries fail due to
unsupported correlated subqueries</li>
+ <li style={{fontSize:'14px'}}>No automatic query rewriting - must
explicitly query materialized views; cannot accelerate queries on base
tables</li>
+ <li style={{fontSize:'14px'}}>Frequent OOM errors on large
multi-table queries</li>
</ul>
</td>
</tr>
<tr>
<td><strong>Real-time Updates</strong></td>
<td><ul>
- <li style={{fontSize:'14px'}}>Features a strongly consistent
primary key storage model, supporting synchronous data updates and
deletions</li>
- <li style={{fontSize:'14px'}}>Implements Merge-On-Write for unique
keys with a delete bitmap, ensuring query performance remains unaffected.</li>
+ <li style={{fontSize:'14px'}}><strong>34x faster query
performance</strong> than ClickHouse for real-time update workloads</li>
Review Comment:
The claim of "34x faster query performance" for real-time update workloads
is a very specific performance metric that should be supported by benchmark
data or citations. Without context about the test conditions, dataset size,
update patterns, and benchmark methodology, this appears as an unsubstantiated
claim. Consider adding a reference to the benchmark or providing more context.
##########
docs/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -106,33 +106,54 @@ Apache Doris and ClickHouse are both leading global
real-time data warehouses, s
</td>
</tr>
<tr>
- <td><strong>Queries</strong></td>
+ <td><strong>Join Query Performance</strong></td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Distributed joins</li>
- <li style={{fontSize:'14px'}}>Cost-Based Optimization (CBO)</li>
- <li style={{fontSize:'14px'}}>Query rewrites and multi-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Higher concurrent performance</li>
+ <li style={{fontSize:'14px'}}><strong>2-10x faster joins</strong>
with true distributed join execution across nodes</li>
+ <li style={{fontSize:'14px'}}>Advanced Cost-Based Optimizer (CBO)
automatically selects optimal join strategies (broadcast, shuffle,
colocate)</li>
+ <li style={{fontSize:'14px'}}>Colocate Join eliminates network
shuffle for pre-partitioned tables</li>
+ <li style={{fontSize:'14px'}}>Runtime Filter pushdown reduces data
scanning by up to 90%</li>
+ <li style={{fontSize:'14px'}}>Transparent query acceleration -
queries on base tables are automatically rewritten to use materialized
views</li>
+ <li style={{fontSize:'14px'}}>Handles complex TPC-DS queries that
cause OOM in ClickHouse</li>
</ul>
</td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Poor join implementation</li>
- <li style={{fontSize:'14px'}}>Lacks a Cost-Based Optimizer
(CBO)</li>
- <li style={{fontSize:'14px'}}>Only supports single-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Lower concurrency performance</li>
+ <li style={{fontSize:'14px'}}>Limited join capability - relies on
subqueries and denormalization</li>
+ <li style={{fontSize:'14px'}}>No Cost-Based Optimizer; requires
manual query tuning</li>
+ <li style={{fontSize:'14px'}}>Scatter-Gather architecture not
designed for distributed joins</li>
+ <li style={{fontSize:'14px'}}>~50% of TPC-DS queries fail due to
unsupported correlated subqueries</li>
Review Comment:
The claim "~50% of TPC-DS queries fail due to unsupported correlated
subqueries" is a specific statistical claim that should be supported by
benchmarks or citations. Without evidence, this could be considered
unsubstantiated. Consider adding a reference or softening the claim with
language like "many TPC-DS queries" instead of a specific percentage.
##########
docs/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -166,15 +187,24 @@ Apache Doris and ClickHouse are both leading global
real-time data warehouses, s
</td>
</tr>
<tr>
- <td><strong>Storage-Compute Separation</strong></td>
+ <td><strong>Cost Efficiency (Storage-Compute Separation)</strong></td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Available as an open-source
feature since version 3.0</li>
+ <li style={{fontSize:'14px'}}><strong>Up to 70% cost
reduction</strong> by independently scaling compute and storage</li>
Review Comment:
The claim "Up to 70% cost reduction" is a significant cost savings claim
that should be supported by case studies or specific scenarios. Without context
about the workload patterns, infrastructure setup, or baseline comparison, this
appears as an unsubstantiated claim. Consider adding a reference to a case
study or providing more context about when this cost reduction is achievable.
##########
docs/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -2,11 +2,11 @@
{
"title": "Alternative to ClickHouse",
"language": "en",
- "description": "Apache Doris and ClickHouse are both leading global
real-time data warehouses, supporting columnar storage and fast queries."
+ "description": "Apache Doris offers 10x faster join queries, up to 70%
lower costs via storage-compute separation, and 34x better real-time update
performance compared to ClickHouse."
}
---
-Apache Doris and ClickHouse are both leading global real-time data warehouses,
supporting columnar storage and fast queries. Doris has advantages like higher
concurrency, more efficient joins, easier maintenance, and MySQL-like SQL
syntax, making it simpler to use and deploy.
+Apache Doris and ClickHouse are both leading real-time analytical databases
with columnar storage and fast query capabilities. Apache Doris offers
significant advantages over ClickHouse in three critical areas: **10x faster
join query performance** through its advanced MPP architecture with Cost-Based
Optimizer, **lower infrastructure costs** via compute-storage separation that
allows independent scaling of resources, and **superior real-time update
performance** with its Merge-on-Write engine that maintains query speed during
high-frequency data modifications.
Review Comment:
The claim of "10x faster join query performance" in the introduction appears
inconsistent with the "2-10x faster joins" claim in line 112. The introduction
should use the same range (2-10x) as stated in the detailed comparison table
for accuracy and consistency.
##########
versioned_docs/version-4.x/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -106,33 +106,54 @@ Apache Doris and ClickHouse are both leading global
real-time data warehouses, s
</td>
</tr>
<tr>
- <td><strong>Queries</strong></td>
+ <td><strong>Join Query Performance</strong></td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Distributed joins</li>
- <li style={{fontSize:'14px'}}>Cost-Based Optimization (CBO)</li>
- <li style={{fontSize:'14px'}}>Query rewrites and multi-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Higher concurrent performance</li>
+ <li style={{fontSize:'14px'}}><strong>2-10x faster joins</strong>
with true distributed join execution across nodes</li>
+ <li style={{fontSize:'14px'}}>Advanced Cost-Based Optimizer (CBO)
automatically selects optimal join strategies (broadcast, shuffle,
colocate)</li>
+ <li style={{fontSize:'14px'}}>Colocate Join eliminates network
shuffle for pre-partitioned tables</li>
+ <li style={{fontSize:'14px'}}>Runtime Filter pushdown reduces data
scanning by up to 90%</li>
+ <li style={{fontSize:'14px'}}>Transparent query acceleration -
queries on base tables are automatically rewritten to use materialized
views</li>
+ <li style={{fontSize:'14px'}}>Handles complex TPC-DS queries that
cause OOM in ClickHouse</li>
</ul>
</td>
<td>
<ul>
- <li style={{fontSize:'14px'}}>Poor join implementation</li>
- <li style={{fontSize:'14px'}}>Lacks a Cost-Based Optimizer
(CBO)</li>
- <li style={{fontSize:'14px'}}>Only supports single-table
materialized views</li>
- <li style={{fontSize:'14px'}}>Lower concurrency performance</li>
+ <li style={{fontSize:'14px'}}>Limited join capability - relies on
subqueries and denormalization</li>
+ <li style={{fontSize:'14px'}}>No Cost-Based Optimizer; requires
manual query tuning</li>
+ <li style={{fontSize:'14px'}}>Scatter-Gather architecture not
designed for distributed joins</li>
+ <li style={{fontSize:'14px'}}>~50% of TPC-DS queries fail due to
unsupported correlated subqueries</li>
+ <li style={{fontSize:'14px'}}>No automatic query rewriting - must
explicitly query materialized views; cannot accelerate queries on base
tables</li>
+ <li style={{fontSize:'14px'}}>Frequent OOM errors on large
multi-table queries</li>
</ul>
</td>
</tr>
<tr>
<td><strong>Real-time Updates</strong></td>
<td><ul>
- <li style={{fontSize:'14px'}}>Features a strongly consistent
primary key storage model, supporting synchronous data updates and
deletions</li>
- <li style={{fontSize:'14px'}}>Implements Merge-On-Write for unique
keys with a delete bitmap, ensuring query performance remains unaffected.</li>
+ <li style={{fontSize:'14px'}}><strong>34x faster query
performance</strong> than ClickHouse for real-time update workloads</li>
Review Comment:
The claim of "34x faster query performance" for real-time update workloads
is a very specific performance metric that should be supported by benchmark
data or citations. Without context about the test conditions, dataset size,
update patterns, and benchmark methodology, this appears as an unsubstantiated
claim. Consider adding a reference to the benchmark or providing more context.
##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/gettingStarted/alternatives/alternative-to-clickhouse.mdx:
##########
@@ -2,81 +2,81 @@
{
"title": "Apache Doris vs ClickHouse",
"language": "zh-CN",
- "description": "Apache Doris 和 ClickHouse
都是全球领先的实时数据仓库,均支持列式存储与极速查询。此外,Doris 具备更高的并发处理能力、更高效的 Join 查询、更简便的运维,并兼容 MySQL
的语法,使其更易于使用和部署。"
+ "description": "Apache Doris 相比 ClickHouse 具备 10 倍以上的 Join 查询性能、通过存算分离降低高达
70% 的成本,以及实时更新场景下 34 倍的查询性能优势。"
}
---
-Apache Doris 和 ClickHouse 都是全球领先的实时数据仓库,均支持列式存储与极速查询。此外,Doris 具备更高的并发处理能力、更高效的
Join 查询、更简便的运维,并兼容 MySQL 的语法,使其更易于使用和部署。
+Apache Doris 和 ClickHouse 都是全球领先的实时分析型数据库,均支持列式存储与极速查询。Apache Doris
在三个关键领域具有显著优势:**Join 查询性能提升 2-10 倍**(基于先进的 MPP 架构与基于成本的查询优化器)、**通过存算分离降低高达 70%
的基础设施成本**(支持计算与存储资源独立扩缩)、以及**卓越的实时更新性能**(写时合并引擎在高频数据更新时仍能保持稳定的查询性能)。
Review Comment:
The claim of "10 倍以上的 Join 查询性能" (10x faster join query performance) in the
introduction appears inconsistent with the "Join 性能提升 2-10 倍" (2-10x faster
joins) claim in line 112. The introduction should use the same range (2-10x) as
stated in the detailed comparison table for accuracy and consistency.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]