[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740052#comment-13740052
 ] 

Owen O'Malley commented on HIVE-5091:
-------------------------------------

This patch:
* Adds a new table property orc.block.padding, which defaults to true.
* For stripes smaller than a block, if they would straddle the block boundary, 
zeros are written to get to the start of the next block.
* The max block size is set to 1.5GB since 2GB - 1 created issues with 
blocksizes needing to be divisible by the checksum length (512).
* Cleans up the interface to the OrcFile.createWriter so that the user can set 
parameters by name.
* Cleans up the ability to write the 0.11 version of ORC files that was added 
in HIVE-4123. Ensures that the direct string encoding isn't used for 0.11 ORC 
files.
* Updated most of the tests to use the new createWriter API.

                
> ORC files should have an option to pad stripes to the HDFS block boundaries
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-5091
>                 URL: https://issues.apache.org/jira/browse/HIVE-5091
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-5091.D12249.1.patch
>
>
> With ORC stripes being large, if a stripe straddles an HDFS block, the 
> locality of read is suboptimal. It would be good to add padding to ensure 
> that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to