[
https://issues.apache.org/jira/browse/PARQUET-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312708#comment-14312708
]
Daniel Weeks commented on PARQUET-177:
--------------------------------------
Based on review comments, the limit is not based on row group size, but on an
estimated minimum column chunk size. This will help ensure that a "reasonable"
amount of data will be written per column (default is page size).
> MemoryManager ensure minimum Column Chunk size
> ----------------------------------------------
>
> Key: PARQUET-177
> URL: https://issues.apache.org/jira/browse/PARQUET-177
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Affects Versions: 1.6.0rc2
> Reporter: Daniel Weeks
> Assignee: Daniel Weeks
> Priority: Minor
> Fix For: parquet-mr_1.6.0
>
>
> The memory manager currently has no limit to how small it will make row
> groups. This is problematic because jobs that have a large number of writers
> can result in tiny row groups that hurt performance.
> The following patch will allow a configurable minimum size before killing the
> job. Default is currently no limit.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)