tuhaihe commented on code in PR #275:
URL: https://github.com/apache/cloudberry-site/pull/275#discussion_r2128126229


##########
docs/pax-table-format.md:
##########
@@ -0,0 +1,529 @@
+---
+title: PAX Storage Format
+---
+
+# PAX Storage Format
+
+Apache Cloudberry supports the PAX (Partition Attributes Across) storage 
format.
+
+PAX is a database storage format that combines the benefits of row-based 
storage (NSM, N-ary Storage Model) and column-based storage (DSM, Decomposition 
Storage Model). It is designed to improve query performance, particularly in 
terms of cache efficiency. In OLAP scenarios, PAX offers batch write 
performance similar to row-based storage and read performance like column-based 
storage. PAX can adapt to both cloud environments with object storage models 
and traditional offline physical file-based storage methods.
+
+Compared to traditional storage formats, PAX has the following features:
+
+- Data updates and deletions: PAX uses a mark-and-delete approach for data 
updates and deletions. This effectively manages changes in physical files 
without immediately rewriting the entire data file.
+- Concurrency control and read-write isolation: PAX uses Multi-Version 
Concurrency Control (MVCC) to achieve efficient concurrency control and 
read-write isolation. The control granularity reaches the level of individual 
data files, enhancing operation safety and efficiency.
+- Index support: PAX supports B-tree indexes, which help speed up query 
operations. This is particularly useful for improving data retrieval speed when 
dealing with large amounts of data.
+- Data encoding and compression: PAX offers multiple data encoding methods 
(such as run-length encoding and delta encoding) and compression options (such 
as zstd and zlib), with various compression levels. These features help reduce 
storage space requirements while optimizing read performance.
+- Statistics: Data files contain detailed statistics that are used for quick 
filtering and query optimization, reducing unnecessary data scanning and 
speeding up query processing.
+
+## Applicable scenarios
+
+The hybrid storage capability of PAX makes it suitable for complex OLAP 
applications that need to handle large amounts of data writes and frequent 
queries. Whether you are looking for a high-performance data analysis solution 
in a cloud infrastructure or dealing with large datasets in a traditional data 
center environment, PAX can provide strong support.
+
+## Enable PAX when building Cloudberry from source code
+
+To enable PAX when building Apache Cloudberry from source code, you need to:
+
+1. Make sure that these dependency requirements are met:
+
+    - C/C++ Compiler: GCC/GCC-C++ 11 or later
+    - CMake: 3.11 or later
+    - Protobuf: 3.5.0 or later
+    - ZSTD (libzstd): 1.4.0 or later
+
+2. Run the following command at the top level of the Cloudberry source code 
directory to download the submodules:
+
+   ```bash
+   git submodule update --init --recursive
+   ```
+
+   The following submodules will be downloaded for building and tesing PAX:
+
+    - yyjson (`dependency/yyjson`)
+    - cpp-stub (`contrib/pax_storage/src/cpp/cotnrib`)
+    - googlebench (`contrib/pax_storage/src/cpp/cotnrib`)
+    - googletest (`contrib/pax_storage/src/cpp/cotnrib`)
+    - tabulate (`contrib/pax_storage/src/cpp/cotnrib`)
+

Review Comment:
   can add the note here:
   
   ```suggestion
   :::note
   The submodules are already included in the latest release source code 
archive, so you don't need to download the submodules manually after extracting 
the archive.
   :::
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to