[
https://issues.apache.org/jira/browse/PARQUET-306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Blue resolved PARQUET-306.
-------------------------------
Resolution: Fixed
Fix Version/s: 1.8.0
Merged #211. Thanks for reviewing, Alex!
> Improve alignment between row groups and HDFS blocks
> ----------------------------------------------------
>
> Key: PARQUET-306
> URL: https://issues.apache.org/jira/browse/PARQUET-306
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Ryan Blue
> Assignee: Ryan Blue
> Fix For: 1.8.0
>
>
> Row groups should not span HDFS blocks to avoid remote reads. There are 3
> things we can use to avoid this:
> 1. Set the next row group's size to the remaining bytes in the current HDFS
> block
> 2. Use HDFS-3689, variable-length HDFS blocks, when available
> 3. Pad after row groups close to the block boundary to start the next row
> group at the start of the next block
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)