gongxun0928 opened a new pull request, #1334: URL: https://github.com/apache/cloudberry/pull/1334
Optimize performance of variable-length column offsets by switching from Zstd to delta encoding. This approach better compresses incremental integer sequences, cutting disk space by more than half while maintaining performance. The following is a comparison of file sizes for different encoding methods on TPC-DS 20G: ``` Name PAX(ZSTD) AOCS_SIZE PAX(Delta) PAX SIZE / AOCS * 100% call_center 12 kB 231 kB 10185 bytes 4.31% catalog_page 499 kB 653 kB 393 kB 60.18% catalog_returns 240 MB 171 MB 178 MB 104.09% catalog_sales 3033 MB 1837 MB 1977 MB 107.63% customer 16 MB 12 MB 12 MB 100.00% customer_address 7008 kB 3161 kB 3115 kB 98.54% customer_demographics 28 MB 8164 kB 9292 kB 113.82% date_dim 3193 kB 1406 kB 1249 kB 88.85% household_demographics 42 kB 248 kB 28 kB 11.29% income_band 1239 bytes 225 kB 1239 bytes 0.54% inventory 36 MB 71 MB 36 MB 50.70% item 3084 kB 2479 kB 2227 kB 89.84% promotion 27 kB 239 kB 18 kB 7.53% reason 2730 bytes 226 kB 2280 bytes 0.99% ship_mode 3894 bytes 227 kB 3315 bytes 1.43% store 23 kB 239 kB 18 kB 7.53% store_returns 400 MB 265 MB 277 MB 104.53% store_sales 4173 MB 2384 MB 2554 MB 107.12% time_dim 1702 kB 819 kB 627 kB 76.56% warehouse 5394 bytes 227 kB 4698 bytes 2.02% web_page 21 kB 236 kB 14 kB 5.93% web_returns 116 MB 83 MB 85 MB 102.41% web_sales 1513 MB 908 MB 982 MB 108.15% ``` <!-- Thank you for your contribution to Apache Cloudberry (Incubating)! --> Fixes #ISSUE_Number ### What does this PR do? <!-- Brief overview of the changes, including any major features or fixes --> ### Type of Change - [ ] Bug fix (non-breaking change) - [ ] New feature (non-breaking change) - [ ] Breaking change (fix or feature with breaking changes) - [ ] Documentation update ### Breaking Changes <!-- Remove if not applicable. If yes, explain impact and migration path --> ### Test Plan <!-- How did you test these changes? --> - [ ] Unit tests added/updated - [ ] Integration tests added/updated - [ ] Passed `make installcheck` - [ ] Passed `make -C src/test installcheck-cbdb-parallel` ### Impact <!-- Remove sections that don't apply --> **Performance:** <!-- Any performance implications? --> **User-facing changes:** <!-- Any changes visible to users? --> **Dependencies:** <!-- New dependencies or version changes? --> ### Checklist - [ ] Followed [contribution guide](https://cloudberry.apache.org/contribute/code) - [ ] Added/updated documentation - [ ] Reviewed code for security implications - [ ] Requested review from [cloudberry committers](https://github.com/orgs/apache/teams/cloudberry-committers) ### Additional Context <!-- Any other information that would help reviewers? Remove if none --> ### CI Skip Instructions <!-- To skip CI builds, add the appropriate CI skip identifier to your PR title. The identifier must: - Be in square brackets [] - Include the word "ci" and either "skip" or "no" - Only use for documentation-only changes or when absolutely necessary --> --- <!-- Join our community: - Mailing list: [d...@cloudberry.apache.org](https://lists.apache.org/list.html?d...@cloudberry.apache.org) (subscribe: dev-subscr...@cloudberry.apache.org) - Discussions: https://github.com/apache/cloudberry/discussions --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cloudberry.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cloudberry.apache.org For additional commands, e-mail: commits-h...@cloudberry.apache.org