[ https://issues.apache.org/jira/browse/ORC-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930595#comment-17930595 ]
Taiyang Li commented on ORC-1264: --------------------------------- I wonder how to distinguish if row group boundry is aligned to compression block while reading an ORC file ? > [C++] Add a writer option to align compression block with row group boundary > ---------------------------------------------------------------------------- > > Key: ORC-1264 > URL: https://issues.apache.org/jira/browse/ORC-1264 > Project: ORC > Issue Type: Improvement > Components: C++ > Reporter: Gang Wu > Assignee: Hao Zou > Priority: Major > Fix For: 2.1.0 > > > To reduce unnecessary I/O and decompression when PPD is in effect, we can > enforce the compression block to be aligned with the row group boundary. It > can help avoid unnecessary I/O and decompression of the filtered row groups > before the survived row group within the same compression block. This > implementation does not break the format specs and should be transparent to > any downstream implementation. The caveat may be worse file size which > depends on the data distribution and applied compression algorithm. Therefore > we should make it optional and enable it per the user's choice. -- This message was sent by Atlassian Jira (v8.20.10#820010)