[ https://issues.apache.org/jira/browse/ORC-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948063#comment-17948063 ]
hezuojiao commented on ORC-1264: -------------------------------- Hi [~luffyZ] !image-2025-04-29-16-14-57-406.png|width=550,height=479! I saw in the orc format description that the stream corresponding to the position is {*}fixed{*}. I understand that the position returned by this interface can be used to find the exact starting position of the data stream, but it is difficult and prone to errors now. > [C++] Add a writer option to align compression block with row group boundary > ---------------------------------------------------------------------------- > > Key: ORC-1264 > URL: https://issues.apache.org/jira/browse/ORC-1264 > Project: ORC > Issue Type: Improvement > Components: C++ > Reporter: Gang Wu > Assignee: Hao Zou > Priority: Major > Fix For: 2.1.0 > > > To reduce unnecessary I/O and decompression when PPD is in effect, we can > enforce the compression block to be aligned with the row group boundary. It > can help avoid unnecessary I/O and decompression of the filtered row groups > before the survived row group within the same compression block. This > implementation does not break the format specs and should be transparent to > any downstream implementation. The caveat may be worse file size which > depends on the data distribution and applied compression algorithm. Therefore > we should make it optional and enable it per the user's choice. -- This message was sent by Atlassian Jira (v8.20.10#820010)