[
https://issues.apache.org/jira/browse/PIG-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13674133#comment-13674133
]
Aniket Mokashi commented on PIG-3346:
-------------------------------------
I haven't taken a closer look but I feel that this patch will ensure
numPigSplits < pig.maxCombinedSplitNum. However, this doesn't ensure number of
files processed by one mapper < pig.maxCombinedSplitNum. (what if underlying
InputFormat combines files in one inputsplit?)
> New property that controls the number of combined splits
> --------------------------------------------------------
>
> Key: PIG-3346
> URL: https://issues.apache.org/jira/browse/PIG-3346
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: 0.12
>
> Attachments: PIG-3346.patch
>
>
> Currently, the size of combined splits can be configured by the
> {{pig.maxCombinedSplitSize}} property.
> Although this works fine most of time, it can lead to a undesired situation
> where a single mapper ends up loading a lot of combined splits. Particularly,
> this is bad if Pig uploads them from S3.
> So it will be useful if the max number of combined splits can be configured
> via a property something like {{pig.maxCombinedSplitNum}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira