TomShawn commented on code in PR #275: URL: https://github.com/apache/cloudberry-site/pull/275#discussion_r2104294466
########## i18n/zh/docusaurus-plugin-content-docs/current/heap-and-ao-table-formats.md: ########## @@ -0,0 +1,80 @@ +--- +title: Heap 存储和 AO 存储模型 +--- + +# Heap 存储和 AO 存储模型 + +HashData Lightning 支持 Heap 存储和 Appended-Optimized (AO) 存储两种模型。选择存储模型需要根据数据类型和查询类型来决定。本文档介绍这两种存储模型,并提供选择最优模型的参考建议。 Review Comment: Done ########## docs/pax-table-format.md: ########## @@ -0,0 +1,487 @@ +--- +title: PAX Storage Format +--- + +# PAX Storage Format + +Apache Cloudberry supports the PAX (Partition Attributes Across) storage format. + +PAX is a database storage format that combines the benefits of row-based storage (NSM, N-ary Storage Model) and column-based storage (DSM, Decomposition Storage Model). It is designed to improve query performance, particularly in terms of cache efficiency. In OLAP scenarios, PAX offers batch write performance similar to row-based storage and read performance like column-based storage. PAX can adapt to both cloud environments with object storage models and traditional offline physical file-based storage methods. + +Compared to traditional storage formats, PAX has the following features: + +- Data updates and deletions: PAX uses a mark-and-delete approach for data updates and deletions. This effectively manages changes in physical files without immediately rewriting the entire data file. +- Concurrency control and read-write isolation: PAX uses Multi-Version Concurrency Control (MVCC) to achieve efficient concurrency control and read-write isolation. The control granularity reaches the level of individual data files, enhancing operation safety and efficiency. +- Index support: PAX supports B-tree indexes, which help speed up query operations. This is particularly useful for improving data retrieval speed when dealing with large amounts of data. +- Data encoding and compression: PAX offers multiple data encoding methods (such as run-length encoding and delta encoding) and compression options (such as zstd and zlib), with various compression levels. These features help reduce storage space requirements while optimizing read performance. +- Statistics: Data files contain detailed statistics that are used for quick filtering and query optimization, reducing unnecessary data scanning and speeding up query processing. + +## Applicable scenarios + +The hybrid storage capability of PAX makes it suitable for complex OLAP applications that need to handle large amounts of data writes and frequent queries. Whether you are looking for a high-performance data analysis solution in a cloud infrastructure or dealing with large datasets in a traditional data center environment, PAX can provide strong support. + Review Comment: Added in https://github.com/apache/cloudberry-site/pull/275/commits/822996a81944a89639f6903dd3ff2e58d0ae1505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
