Re: [PR] doc: add PAX related docs [cloudberry-site]

via GitHub Fri, 23 May 2025 04:13:48 -0700


tuhaihe commented on code in PR #275:
URL: https://github.com/apache/cloudberry-site/pull/275#discussion_r2104247222



##########
docs/pax-table-format.md:
##########
@@ -0,0 +1,487 @@
+---
+title: PAX Storage Format
+---
+
+# PAX Storage Format
+
+Apache Cloudberry supports the PAX (Partition Attributes Across) storage 
format.
+
+PAX is a database storage format that combines the benefits of row-based 
storage (NSM, N-ary Storage Model) and column-based storage (DSM, Decomposition 
Storage Model). It is designed to improve query performance, particularly in 
terms of cache efficiency. In OLAP scenarios, PAX offers batch write 
performance similar to row-based storage and read performance like column-based 
storage. PAX can adapt to both cloud environments with object storage models 
and traditional offline physical file-based storage methods.
+
+Compared to traditional storage formats, PAX has the following features:
+
+- Data updates and deletions: PAX uses a mark-and-delete approach for data 
updates and deletions. This effectively manages changes in physical files 
without immediately rewriting the entire data file.
+- Concurrency control and read-write isolation: PAX uses Multi-Version 
Concurrency Control (MVCC) to achieve efficient concurrency control and 
read-write isolation. The control granularity reaches the level of individual 
data files, enhancing operation safety and efficiency.
+- Index support: PAX supports B-tree indexes, which help speed up query 
operations. This is particularly useful for improving data retrieval speed when 
dealing with large amounts of data.
+- Data encoding and compression: PAX offers multiple data encoding methods 
(such as run-length encoding and delta encoding) and compression options (such 
as zstd and zlib), with various compression levels. These features help reduce 
storage space requirements while optimizing read performance.
+- Statistics: Data files contain detailed statistics that are used for quick 
filtering and query optimization, reducing unnecessary data scanning and 
speeding up query processing.
+
+## Applicable scenarios
+
+The hybrid storage capability of PAX makes it suitable for complex OLAP 
applications that need to handle large amounts of data writes and frequent 
queries. Whether you are looking for a high-performance data analysis solution 
in a cloud infrastructure or dealing with large datasets in a traditional data 
center environment, PAX can provide strong support.
+
+## Usage
+
+### Create a PAX table
+
+To create a table in PAX format, you need to set the table access method to 
PAX. You can do this in one of the following ways:
+
+- Use the `USING PAX` clause explicitly when creating the table, for example:
+
+    ```sql
+    CREATE TABLE t1(a int, b int, c text) USING PAX;
+    -- t1 is a PAX table and can be used like a normal heap table.
+    ```
+
+- Set the default table access method to PAX and then create the table:
+
+    ```sql
+    -- Set the default table access method. From now on, newly created tables 
will use PAX format.
+    SET default_table_access_method = pax;
+
+    -- Implicitly use the default access method, which is PAX.
+    CREATE TABLE t1(a int, b int, c text);
+    ```
+
+When creating a table, you can also specify minimum and maximum value 
information for certain columns to speed up queries:
+
+```sql
+-- Use WITH(minmax_columns='b,c') to specify that columns b and c
+-- should record min and max statistics.
+-- This helps optimize queries involving these two columns,
+-- because the system can quickly determine which data blocks might contain 
matching data.
+CREATE TABLE p2(a INT, b INT, c INT) USING pax WITH(minmax_columns='b,c');
+
+INSERT INTO p2 SELECT i, i * 2, i * 3 FROM generate_series(1,10)i;
+
+-- because column b has minmax statistics,
+-- the system can quickly locate the data blocks that might contain the value, 
speeding up the query.
+SELECT * FROM p2 WHERE b = 4;
+
+-- Similarly, because of the minmax information on column b, the system can 
quickly determine that no data blocks can meet this condition
+-- (if all generated values are positive), possibly returning no data 
immediately and avoiding unnecessary data scans.
+SELECT * FROM p2 WHERE b < 0;
+
+-- Modify the minmax statistics settings for table p2 to apply only to column 
b. For data inserted later,
+-- only column b will maintain this statistical information, and it won't 
affect existing data or trigger any rewrites or adjustments.
+ALTER TABLE p2 SET(minmax_columns='b');
+```
+
+### View the format of an existing table
+
+To check whether a table is in PAX format, you can use one of the following 
methods:
+
+- Use the `psql` command `\d+`:
+
+    ```sql
+    gpadmin=# \d+ t1
+                                                Table "public.t1"
+    Column |  Type   | Collation | Nullable | Default | Storage  | Compression 
| Stats target | Description
+    
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
+    a      | integer |           |          |         | plain    |             
|              |
+    b      | integer |           |          |         | plain    |             
|              |
+    c      | text    |           |          |         | extended |             
|              |
+    Distributed by: (a)
+    Access method: pax
+    ```
+
+- Query the system catalog tables `pg_class` and `pg_am`:
+
+    ```sql
+    SELECT relname, amname FROM pg_class, pg_am WHERE relam = pg_am.oid AND 
relname = 't1';
+
+    relname | amname
+    ---------+--------
+    t1      | pax
+    (1 row)
+    ```
+
+## Support for TOAST
+
+If some columns in a PAX table contain large values, you can enable TOAST 
storage to store these large values in a separate TOAST file. This helps to 
make the data in the main data file more compact, allowing you to scan more 
tuples within the same data size.
+
+By default, the TOAST storage is enabled for PAX tables. Unlike PostgreSQL, 
the TOAST storage supported by PAX does not rely on Page management, which 
allows PAX to store data larger than 2 MiB.
+
+You can configure TOAST-related thresholds using the following parameters. For 
more details, refer to [PAX-related system 
parameters](#pax-related-system-parameters). 
+
+- `pax_enable_toast`
+- `pax_min_size_of_compress_toast`
+- `pax_min_size_of_external_toast`
+
+## Support for clustering
+
+Apache Cloudberry supports z-ordering for PAX tables, which greatly improves 
performance when managing and querying multi-dimensional data. Using the 
clustering feature, you can organize data in PAX tables using two different 
sorting strategies: index-based clustering and `reloptions`-based clustering. 
Note that the two strategies are mutually exclusive, and you can only choose 
one.
+
+### Clustering strategies
+
+You can use the clustering feature of Apache Cloudberry to physically sort and 
optimize table data. The goal is to improve query performance by reorganizing 
the physical storage order of the data. In PAX tables, the clustering feature 
provides two different strategies for sorting data:
+
+- Index-based clustering. This strategy is based on B-tree indexes and works 
similarly to the native clustering method in PostgreSQL. It is suitable when an 
index has already been created for the table.
+
+    Example:
+
+    ```sql
+    CREATE TABLE t2(c1 int, c2 int) USING PAX;
+    CREATE INDEX c1_idx ON t2(c1);
+    CLUSTER t2 USING c1_idx;
+    DROP TABLE t2;
+    ```
+
+- `reloptions`-based clustering. PAX tables support clustering operations 
based on `reloptions` where `cluster_columns` are specified, and it provides 
the following sorting methods:
+
+    - Z-order clustering: This method encodes the values of multiple columns 
into a byte array and sorts by this byte array. It is ideal for cases where 
multiple columns are used as query conditions (with no fixed order for the 
columns in the query). This method greatly enhances performance for 
multi-dimensional data queries. However, it is not suitable for string columns 
with the same prefix. When `cluster_type` is not specified, PAX uses z-order 
sorting as the default clustering strategy:
+
+        ```sql
+        CREATE TABLE t2(c1 int, c2 float, c3 text) USING PAX WITH 
(cluster_columns='c1,c2');
+        INSERT INTO t2 SELECT i, i, i::text FROM generate_series(1, 100000) i;
+        CLUSTER t2;
+        DROP TABLE t2;
+        ```
+
+        In this example, the data in table `t2` is sorted using z-order by the 
`c1` and `c2` columns. This sorting method provides great performance 
improvement in multi-dimensional query scenarios.
+
+    - Lexical clustering: This method sorts by the values and order of the 
columns, and it is primarily used for sorting and optimizing queries on string 
columns with the same prefix. For example:
+
+        ```sql
+        CREATE TABLE t2(c1 int, c2 float, c3 text) USING PAX WITH 
(cluster_columns='c1,c2', cluster_type='lexical');
+        INSERT INTO t2 SELECT i, i, i::text FROM generate_series(1, 100000) i;
+        CLUSTER t2;
+        DROP TABLE t2;
+        ```
+
+        In this example, the data in table `t2` is sorted lexically by the 
`c1` and `c2` columns. This sorting method is well-suited for columns with the 
same prefix.
+
+### Conflicts between clustering types
+
+Note that index-based clustering and `cluster_columns`-based clustering cannot 
be used at the same time. If a table has already been clustered based on an 
index (or `cluster_columns`), you cannot specify `cluster_columns` (or an 
index) for clustering.
+
+An example of handling cluster conflicts:
+
+```sql
+-- Creates a table and uses index-based clustering
+CREATE TABLE t2(c1 int, c2 float, c3 text) USING PAX;
+CREATE INDEX t2_idx ON t2(c1);
+CLUSTER t2 USING t2_idx;
+
+-- Tries to use cluster columns (will fail)
+ALTER TABLE t2 SET (cluster_columns='c1,c2', cluster_type='zorder');
+-- Tries to use lexical clustering (will fail)
+ALTER TABLE t2 SET (cluster_columns='c1,c2', cluster_type='lexical');
+
+-- Drops the index and successfully use lexical clustering
+DROP INDEX t2_idx;
+ALTER TABLE t2 SET (cluster_columns='c1,c2', cluster_type='lexical');
+```
+
+In the above example, after table `t2` is clustered using an index, attempts 
to set z-order or lexical clustering using `cluster_columns` will fail, until 
the index is dropped.
+
+## Bloom filter support
+
+Apache Cloudberry supports bloom filters, allowing you to generate and 
maintain bloom filter information at the column level. This feature helps 
quickly filter data blocks and improves query performance, especially when 
using `IN` conditions with multiple values, greatly reducing unnecessary data 
scans.
+
+### Bloom filter options
+
+You can specify the columns for which you want to record bloom filter 
information by setting the `bloomfilter_columns` option. For example:
+
+```sql
+CREATE TABLE p1 (
+    a int,
+    b int,
+    c text
+) USING PAX WITH (bloomfilter_columns='b,c,a');
+```
+
+In this example, bloom filters will be generated for columns `b`, `c`, and `a` 
in the table `p1`.
+
+The size of the bloom filter is controlled by the following GUC (global user 
configuration) parameters:
+
+- `pax_max_tuples_per_file`: Controls the maximum number of tuples stored in 
each data file.
+- `pax_bloom_filter_work_memory_bytes`: Controls the maximum memory allowed 
for the bloom filter.
+
+For example:
+
+```sql
+-- Sets the maximum number of tuples per file.
+SET pax_max_tuples_per_file = 131073;
+
+-- Sets the maximum bloom filter memory.
+SET pax_bloom_filter_work_memory_bytes = 1048576;  -- 1MB
+
+-- Creates a table and specifies bloom filter options for columns.
+CREATE TABLE p1 (
+    a int,
+    b int,
+    c text
+) USING PAX WITH (bloomfilter_columns='b,c,a');
+```
+
+The size of the generated bloom filter is calculated as follows:
+
+```sql
+ceil(min(pax_max_tuples_per_file * 2, pax_bloom_filter_work_memory_bytes))
+```
+
+`ceil` is a rounding function, which ensures that the size of the bloom filter 
is always a power of `2`.
+
+## Sparse filtering
+
+Sparse filtering is a data scanning optimization provided by the PAX storage 
format. It improves query performance by skipping file scans that do not meet 
conditions and reducing the amount of data block to be scanned within files.
+
+To use sparse filtering, make sure that the following conditions are
+met:
+
+- The system parameter `pax_enable_sparse_filter` is set to `ON` (default 
value).
+- Statistics have been configured for the relevant columns (`min`/`max` or 
bloom filter).
+- The query contains filter conditions (`WHERE` clause) for these columns
+
+### Expression support examples
+
+PAX supports basic conditional expressions. For example:
+
+```sql
+-- Creates test table
+CREATE TABLE a(v1 int, v2 int, v3 int) USING PAX WITH(minmax_columns='v1, v2, 
v3');
+
+-- Basic conditions
+SELECT * FROM a WHERE v1 <= 3;                                    -- Full 
condition support. root is OpExpr.
+SELECT * FROM a WHERE v1 != NULL;                                -- Full 
condition support. root is NULLTest.
+
+-- Multiple conditions
+SELECT * FROM a WHERE v1 <= 3 AND v2 >= 3;                       -- Full 
condition support. Tree structure has only one level.
+SELECT * FROM a WHERE v1 <= 3 AND v2 >= 3 AND v3 >= 3;          -- Full 
condition support.
+SELECT * FROM a WHERE v1 <= 3 AND v2 >= 3 AND v3 != NULL;       -- Full 
condition support.
+```
+
+PAX also supports nested expression structures and certain operators. For 
example:
+
+```sql
+-- Creates test table.
+CREATE TABLE a(v1 int, v2 int, v3 int) USING PAX WITH(minmax_columns='v1, v2, 
v3');
+
+-- Nested expressions.
+SELECT * FROM a WHERE (v1 <= 3 OR v2 > 3) AND v3 >= 10;         -- Full 
support for all conditions.
+SELECT * FROM a WHERE (v1 <= 3 AND v2 > 3) OR v3 >= 10;         -- Supports 
complete nested expressions.
+
+-- Operators.
+SELECT * FROM a WHERE v1 + v2 <= 3;
+SELECT * FROM a WHERE v1 + 10 <= 3;
+```
+
+PAX can handle complete expression trees, including nested `AND`/`OR` 
conditions. This means that all the above queries can be optimized, and Apache 
Cloudberry will use all available filter conditions to improve query efficiency.
+
+### Supported expression types
+
+PAX sparse filtering supports the following expression types:
+
+- Arithmetic operators (`OpExpr`)
+
+    - Supports addition (`+`), subtraction (`-`), and multiplication (`*`)
+    - Division operations are not supported (due to difficulty in estimating 
negative number ranges)
+    - Examples: `a + 1`, `1 - a`, `a * b`
+
+- Comparison operators (`OpExpr`)
+
+    - Supports `<`, `<=`, `=`, `>=`, `>`
+    - Examples: `a < 1`, `a + 1 < 10`, `a = b`
+
+- Logical operators (`BoolExpr`)
+
+    - Supports `AND`, `OR`, `NOT`
+    - Example: `a < 10 AND b > 10`
+
+- NULL value tests (`NullTest`)
+
+    - Supports `IS NULL`, `IS NOT NULL`
+    - Examples: `a IS NULL`, `a IS NOT NULL`
+
+- Type casting (`FuncExpr`)
+
+    - Only supports basic type casting
+    - Example: `a::float8 < 1.1`
+    - Custom functions are not supported, such as `func(a) < 10`
+
+- `IN` operator (`ScalarArrayOpExpr`)
+
+    - Supports `IN` expressions
+    - Example: `a IN (1, 2, 3)`
+
+### Partial condition support
+
+When a query contains unsupported expressions (such as custom functions), PAX 
adopts a partial condition support strategy:
+
+- Identifies and extracts supported conditions.
+- Uses supported conditions for sparse filtering.
+
+For example:
+
+```sql
+-- Creates custom function v2
+CREATE OR REPLACE FUNCTION func(v2 double precision)
+RETURNS double precision AS $$
+BEGIN
+  RETURN v2 * 2;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Uses custom function in query
+SELECT * FROM a WHERE v1 < 3 AND func(v2) < 10;  -- PAX will use v1 < 3 for 
filtering
+```
+
+In this example, although `func(v2) < 10` cannot be used for sparse filtering, 
Apache Cloudberry still uses `v1 < 3` to optimize query performance. This 
approach ensures that partial performance optimization is achieved even in 
complex queries.
+
+:::note
+- The effectiveness of sparse filtering depends on data distribution and query 
conditions.
+- It is recommended to enable statistics on columns that are frequently used 
as filter conditions.
+- Certain conditions (such as custom functions) are ignored.
+:::
+
+## View data change records on PAX tables in WAL logs
+
+Data operations related to PAX tables are recorded in the Write-Ahead Logging 
(WAL) system. This ensures reliable backups between primary and mirror nodes, 
which assists in failover processes.
+
+You can use the `pg_waldump` tool to view the WAL logs for PAX tables. The 
logs are stored in the `$COORDINATOR_DATA_DIRECTORY/pg_wal` directory or the 
`pg_wal` directory within the Segment node's data directory.
+
+The PAX table WAL logs can be used for these purposes:
+
+- **Data recovery**: Helps restore data during system failures or data 
inconsistencies.
+- **Failover support**: Ensures data synchronization between primary and 
mirror nodes.
+- **Debugging and analysis**: Identifies and analyzes specific operations on 
PAX tables.
+
+### Example: operate PAX tables and view WAL logs
+
+1. Create a PAX table.
+
+    ```sql
+    CREATE TABLE a (
+       v1 int,
+       v2 int,
+       v3 int
+    ) USING PAX;
+    ```
+
+    This creates a PAX table `a` with columns `v1`, `v2`, and `v3`.
+
+2. Insert data into the table:
+
+    ```sql
+    INSERT INTO a VALUES (1, 2, 3);
+    ```
+
+3. Locate the node where the PAX table resides:
+
+    ```sql
+    -- Views the IDs and data directory locations of all nodes in the cluster
+    SELECT * FROM gp_segment_configuration;
+
+    -- Identifies the Segment ID of table a based on the results of the above 
query
+    -- This helps determine the segment and data directory location of table a
+    SELECT gp_segment_id FROM gp_distribution_policy WHERE localoid = 
'a'::regclass;
+    ```
+
+4. Find the corresponding WAL log in the `pg_wal` directory of the identified 
node. For example:
+
+    ``` bash
+    ls $COORDINATOR_DATA_DIRECTORY/pg_wal
+    ```
+
+    Locate the most recent WAL file to analyze the relevant operations. In 
this example, assume the WAL log file is named `00000001000000000000000A`.
+
+5.  Query the `relfilenode` of the PAX table to associate the WAL log with the 
PAX table:
+
+    ```sql
+    SELECT relfilenode FROM pg_class WHERE relname = 'a';
+    ```
+
+    The `relfilenode` is the unique identifier of the table in the WAL log.
+
+6.  Use the `pg_waldump` tool to parse the WAL log and identify operations 
related to the PAX table:
+
+    ```bash
+    pg_waldump -f $COORDINATOR_DATA_DIRECTORY/pg_wal/00000001000000000000000A
+    ```
+
+    In the output, search for the previously obtained `relfilenode` to find 
operations associated with the PAX table.
+
+    Example output of `pg_waldump`:
+
+    ```text
+    rmgr: PAX       len (rec/tot):     68/   104, tx:    593, lsn: 0/016E1D98, 
prev 0/016E1D60, desc: INSERT off 3, blkref #0: rel 1663/16384/19780 blk 0
+    ```
+
+    In this example:
+
+    - `rmgr: PAX` indicates the record type.
+    - `rel 1663/16384/19780` corresponds to the `relfilenode` of the PAX table.
+    - `INSERT` indicates that an insert operation was recorded.
+
+By following the steps above, you can track and analyze WAL logs related to 
PAX tables in Apache Cloudberry for debugging, recovery, or performance 
optimization purposes.
+
+## Limitations for PAX tables
+
+- For index support, the PAX storage format currently only supports B-tree 
(`btree`) indexes. Bugs might occur when using GiST, SP-GiST (`gist/spgist`) or 
Brin indexes. Supports for other index types are still experimental and might 
be unavailable.
+- Currently, if a field is too long, it will be stored in a TOAST file. This 
TOAST is different from PostgreSQL's TOAST tables, and they only share the same 
name.
+- Unlike traditional heap tables, PAX format does not support `TOAST` fields. 
Currently, all column data is stored in the same data file.
+- The PAX format does not support data backup and restore using `pg_dump` or 
`pg_restore`. PAX tables are ignored during these operations.
+- PAX format does not support Write-Ahead Logging (WAL), so there is no data 
backup between the primary and mirror servers.
+
+## PAX-related SQL options
+
+PAX supports SQL options to control its behavior. You can use these options in 
the `WITH()` clause, for example, `WITH(minmax_columns='b,c', 
storage_format=porc)`.
+
+| Name                   | Type    | Valid values                              
           | Description                                                        
                                                                                
                                                                                
                                                                                
                                          |
+|------------------------|---------|------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `storage_format`       | `string`| •  `porc`<br />•  `porc_vec`              
               | Controls the internal storage format. `porc` is the default 
value, which stores data in a regular format and does not preserve null values. 
`porc_vec` stores data in a vector format and always preserves null values for 
fixed-length fields, regardless of whether the column value is null or not.     
                                              |
+| `compresstype`         | `string`| •  `none`<br />•  `rle`<br />•  
`delta`<br />•  `zstd`<br />•  `zlib`<br />•  `dict` | Specifies the 
compression method for column values. You can only choose one method. The 
default value is `none`. <br /><br />•  `none` means no compression is 
applied.<br />•  `rle` uses run-length encoding to compress repeated 
consecutive data.<br />•  `delta` is used for numeric columns, storing the 
difference between adjacent values.<br />•  `zstd` uses the zstd algorithm, 
which offers high compression ratio and fast compression speed.<br />•  `zlib` 
is a general-purpose compression algorithm, suitable for compressing general 
data.<br />•  `dict` uses dictionary encoding to speed up the processing of 
many repeated strings. Currently, it is an experimental feature and is not 
recommended for production use. |
+| `compresslevel`        | `int`   | Range `[0, 19]`                           
           | Indicates the compression level. The default value is `0`. A 
smaller value means faster compression, while a larger value provides higher 
compression. This option is only effective when used with `compresstype`.       
                                                                                
                                                     |
+| `partition_by`         | `string`| Valid column names in the table           
          | When writing batch data, this option attempts to partition the data 
by the specified column and store it in the same data file, improving data 
locality. Only integer types are supported. This partition key is independent 
of table partitioning and serves as a suggestion for organizing data internally 
in PAX.                              |
+| `partition_ranges`     | `string`| `FROM(XX) TO(YY) [every(DD)]`             
          | This option must be used with `partition_by` to set partition 
ranges. You can define a single range or divide a large range into multiple 
non-contiguous smaller ranges. It attempts to store data within each range in 
the same data file. Data outside the range will be stored in the default data 
file.                                                 |
+| `minmax_columns`       | `string`| Valid column names in the table, 
separated by commas| Records `minmax` statistics for the defined columns to 
speed up data queries. After renaming a column, the statistics will no longer 
be recorded for that column. If you modify `minmax_columns` using `ALTER 
TABLE`, the change only applies to data files written afterward and does not 
affect existing data files.                                    |
+| `cluster_columns`      | `string`| A comma-separated list of valid column 
names in the table. | Indicates that the PAX table stores the internal data in 
a clustered way. When using the `cluster` command, the data can be sorted by 
these columns, and the sorting method is controlled by the `cluster_type` 
parameter.                                                                      
                                                  |
+| `bloomfilter_columns`  | `string`| A comma-separated list of valid column 
names in the table. | Calculates bloom filters for the data in the specified 
columns to help data filtering.                                                 
                                                                                
                                                                                
                                                       |
+| `parallel_workers`     | `int`   | `[0,64]`                                  
           | A PostgreSQL option that sets the number of parallel processes for 
concurrent scanning.                                                            
                                                                                
                                                                                
                                           |
+| `cluster_type`         | `string`| `lexical` and `zorder`                    
          | Specifies the cluster type when clustering is based on custom 
columns defined in `reloptions` rather than an index. Valid values are 
`lexical` and `zorder`. `lexical` sorts by the values and order of the columns, 
while `zorder` encodes the values of multiple columns into a byte array and 
sorts by that array.                                    |
+
+
+The values of these options only affect newly inserted and updated data and do 
not change the existing data.
+
+## PAX-related system parameters
+
+The following system parameters (GUC) are used to set the behavior of PAX 
tables in the current session. Use the command `SET <parameter>=<value>` to 
configure them, for example, `SET pax_enable_sparse_filter=on`.
+
+| Parameter name                       | Value type | Valid values             
                                                                | Description   
                                                                                
                                                                                
                                                  |
+|--------------------------------------|------------|------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `pax_enable_sparse_filter`           | `bool`     | `on` and `off`           
                                                                | Specifies 
whether to enable sparse filtering based on statistics. The default value is 
`on`, which means that sparse filtering is enabled by default.                  
                                                         |
+| `pax_enable_row_filter`              | `bool`     | `on` and `off`           
                                                                | Specifies 
whether to enable row filtering. The default value is `off`, which means that 
row filtering is disabled. It is not recommended to enable this parameter.      
                                                        |
+| `pax_scan_reuse_buffer_size`         | `int`      | Range `[1048576, 
33554432]` (1MiB to 32MiB)                                              | The 
buffer block size used during scanning. The default value is `8388608` (8MiB).  
                                                                                
                                                            |
+| `pax_max_tuples_per_group`           | `int`      | Range `[5, 524288]`      
                                                                | Specifies the 
maximum number of tuples allowed in each group. The default value is `131072`.  
                                                                                
                                                  |
+| `pax_max_tuples_per_file`            | `int`      | Range `[131072, 
8388608]`                                                                | 
Specifies the maximum number of tuples allowed in each data file. The maximum 
value is `8388608`. The default value is `1310720`.                             
                                                                  |
+| `pax_max_size_per_file`              | `int`      | Range `[8388608, 
335544320]` (8MiB to 320MiB)                                            | The 
maximum physical size allowed for each data file. The default value is 
`67108864` (64MiB). The actual file size might be slightly larger than the set 
size. Very large or small values might negatively impact performance. |
+| `pax_enable_toast`                   | `bool`     | `on` and `off`. The 
default value is `on`.                                               | 
Specifies whether to enable TOAST support.                                      
                                                                                
                                                                |
+| `pax_min_size_of_compress_toast`     | `int`      | Range `[524288, 
1073741824]` (512KiB to 1GiB). The default value is `524288` (512KiB).   | 
Specifies the threshold for creating compressed TOAST tables. If the character 
length exceeds this threshold, Apache Cloudberry creates compressed TOAST 
tables for storage.                                                    |
+| `pax_min_size_of_external_toast`     | `int`      | Range `[10485760, 
2147483647]` (10MiB to 2GiB). The default value is `10485760` (10MiB). | 
Specifies the threshold for creating external TOAST tables. If the character 
length exceeds this threshold, Apache Cloudberry creates external TOAST tables 
for storage.                                                        |
+| `pax_default_storage_format`         | `string`   | `porc` (the default 
value)                                                               | Controls 
the default storage format.                                                     
                                                                                
                                                       |
+| `pax_bloom_filter_work_memory_bytes` | `int`      | Range `[1024, 
2147483647]` (1KiB to 2GiB). The default value is `10240` (10KiB).         | 
Controls the maximum memory allowed for bloom filter usage.                     
                                                                                
                                                                |
+
+## Best practices
+
+- Use partitioning options:
+
+    - It is recommended to use partitioning options when data needs to be 
imported based on an integer column and meets the following conditions:
+
+        - The data is evenly distributed across this column, with a wide range 
and no extreme concentration.
+        - This column is often used as a filter condition in queries or as a 
key for joins.
+
+    - Keep in mind that the PAX partition key is only effective during a 
single batch data import. It cannot be adjusted between multiple data writes. 
The partition key settings only apply to future inserts or updates, so after 
changing the partition key, newly imported data will follow the new partition 
key.
+
+- Use `minmax` statistics:
+
+    - For columns with a wide data range that are often used in query filters, 
setting `minmax` values for these columns can greatly speed up the query 
process.
+    - By using `minmax` statistics, if a column in a data file does not match 
the `minmax` values or null tests, the entire file can be skipped quickly, 
avoiding unnecessary data scans.
+    - Important note: The effectiveness of `minmax` depends on how data is 
inserted. If the data in a PAX table is inserted in batches (such as with 
`batch insert` or `copy`) and the data range within each batch is continuous, 
then `minmax` will be very effective. However, if the data is inserted 
randomly, the effectiveness of `minmax` filtering may be reduced.
+
+- If a PAX table has multiple columns used in query filters, you can use 
`with(cluster_columns='b,c,d', cluster_type='zorder')`. After sorting the data 
with `zorder` encoding, any of the `cluster_columns` can benefit from 
filtering. In comparison, for single-column-based cluster sorting, filtering is 
only effective on the sorted key, with minimal impact on other columns. 
Clustering sorting influences `minmax`-based filtering but does not affect 
`bloom filter`-based filtering.
+
+- If your query uses filters like `column1 in (12, 11, 13)` or `column1 = 
'X'`, you might consider using a `bloom filter 
(with(bloomfilter_columns='b,c'))`. In internal implementation, PAX calculates 
bloom filters for certain data. If the bloom filter can confirm that the data 
is not in the block, PAX can skip the current block. Note that the 
effectiveness of bloom filter depends on several factors:
+
+    - The amount of data within a `group`.
+    - The space used by the `bloom filter`.

Review Comment:
   Also, there are some dev docs on PAX in the main repo. Will we add them to 
the `Developer` module? If not, can add their links here for users and 
developers' references. FYI.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] doc: add PAX related docs [cloudberry-site]

Reply via email to