[ https://issues.apache.org/jira/browse/ORC-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948028#comment-17948028 ]
hezuojiao commented on ORC-1264: -------------------------------- [~wgtmac] hi, I noticed that the MR added an interface `Reader::getRowGroupIndex` to get position information of row groups. how to use the interface to obtain the specific position of each data stream? I want to utilize the interface for row group level `perbuffer` in execution engine, but I found that it must be used in conjunction with encoded streams. Thanks. > [C++] Add a writer option to align compression block with row group boundary > ---------------------------------------------------------------------------- > > Key: ORC-1264 > URL: https://issues.apache.org/jira/browse/ORC-1264 > Project: ORC > Issue Type: Improvement > Components: C++ > Reporter: Gang Wu > Assignee: Hao Zou > Priority: Major > Fix For: 2.1.0 > > > To reduce unnecessary I/O and decompression when PPD is in effect, we can > enforce the compression block to be aligned with the row group boundary. It > can help avoid unnecessary I/O and decompression of the filtered row groups > before the survived row group within the same compression block. This > implementation does not break the format specs and should be transparent to > any downstream implementation. The caveat may be worse file size which > depends on the data distribution and applied compression algorithm. Therefore > we should make it optional and enable it per the user's choice. -- This message was sent by Atlassian Jira (v8.20.10#820010)