Re: [PR] doc: add PAX related docs [cloudberry-site]

via GitHub Fri, 23 May 2025 03:23:26 -0700


TomShawn commented on code in PR #275:
URL: https://github.com/apache/cloudberry-site/pull/275#discussion_r2104294466



##########
i18n/zh/docusaurus-plugin-content-docs/current/heap-and-ao-table-formats.md:
##########
@@ -0,0 +1,80 @@
+---
+title: Heap 存储和 AO 存储模型
+---
+
+# Heap 存储和 AO 存储模型
+
+HashData Lightning 支持 Heap 存储和 Appended-Optimized (AO) 
存储两种模型。选择存储模型需要根据数据类型和查询类型来决定。本文档介绍这两种存储模型，并提供选择最优模型的参考建议。

Review Comment:
   Done



##########
docs/pax-table-format.md:
##########
@@ -0,0 +1,487 @@
+---
+title: PAX Storage Format
+---
+
+# PAX Storage Format
+
+Apache Cloudberry supports the PAX (Partition Attributes Across) storage 
format.
+
+PAX is a database storage format that combines the benefits of row-based 
storage (NSM, N-ary Storage Model) and column-based storage (DSM, Decomposition 
Storage Model). It is designed to improve query performance, particularly in 
terms of cache efficiency. In OLAP scenarios, PAX offers batch write 
performance similar to row-based storage and read performance like column-based 
storage. PAX can adapt to both cloud environments with object storage models 
and traditional offline physical file-based storage methods.
+
+Compared to traditional storage formats, PAX has the following features:
+
+- Data updates and deletions: PAX uses a mark-and-delete approach for data 
updates and deletions. This effectively manages changes in physical files 
without immediately rewriting the entire data file.
+- Concurrency control and read-write isolation: PAX uses Multi-Version 
Concurrency Control (MVCC) to achieve efficient concurrency control and 
read-write isolation. The control granularity reaches the level of individual 
data files, enhancing operation safety and efficiency.
+- Index support: PAX supports B-tree indexes, which help speed up query 
operations. This is particularly useful for improving data retrieval speed when 
dealing with large amounts of data.
+- Data encoding and compression: PAX offers multiple data encoding methods 
(such as run-length encoding and delta encoding) and compression options (such 
as zstd and zlib), with various compression levels. These features help reduce 
storage space requirements while optimizing read performance.
+- Statistics: Data files contain detailed statistics that are used for quick 
filtering and query optimization, reducing unnecessary data scanning and 
speeding up query processing.
+
+## Applicable scenarios
+
+The hybrid storage capability of PAX makes it suitable for complex OLAP 
applications that need to handle large amounts of data writes and frequent 
queries. Whether you are looking for a high-performance data analysis solution 
in a cloud infrastructure or dealing with large datasets in a traditional data 
center environment, PAX can provide strong support.
+

Review Comment:
   Added in 
https://github.com/apache/cloudberry-site/pull/275/commits/822996a81944a89639f6903dd3ff2e58d0ae1505



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] doc: add PAX related docs [cloudberry-site]

Reply via email to