[ 
https://issues.apache.org/jira/browse/ORC-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948063#comment-17948063
 ] 

hezuojiao commented on ORC-1264:
--------------------------------

Hi [~luffyZ]

!image-2025-04-29-16-14-57-406.png|width=550,height=479!

I saw in the orc format description that the stream corresponding to the 
position is {*}fixed{*}. I understand that the position returned by this 
interface can be used to find the exact starting position of the data stream, 
but it is difficult and prone to errors now.

> [C++] Add a writer option to align compression block with row group boundary
> ----------------------------------------------------------------------------
>
>                 Key: ORC-1264
>                 URL: https://issues.apache.org/jira/browse/ORC-1264
>             Project: ORC
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Gang Wu
>            Assignee: Hao Zou
>            Priority: Major
>             Fix For: 2.1.0
>
>
> To reduce unnecessary I/O and decompression when PPD is in effect, we can 
> enforce the compression block to be aligned with the row group boundary. It 
> can help avoid unnecessary I/O and decompression of the filtered row groups 
> before the survived row group within the same compression block. This 
> implementation does not break the format specs and should be transparent to 
> any downstream implementation. The caveat may be worse file size which 
> depends on the data distribution and applied compression algorithm. Therefore 
> we should make it optional and enable it per the user's choice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to