[
https://issues.apache.org/jira/browse/PIG-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677298#comment-13677298
]
Cheolsoo Park commented on PIG-3346:
------------------------------------
[~aniket486], you're absolutely right. Like the name of the property indicates,
it is concerned about the number of splits rather than the number of files.
Ensuring the number of files per task is beyond the scope of this jira. To do
that, I think we might have to re-implement the current split combination logic
using CombinedInputFormat.
I think having this property for the short term is still useful. But please
feel free to disagree with me.
Thanks!
> New property that controls the number of combined splits
> --------------------------------------------------------
>
> Key: PIG-3346
> URL: https://issues.apache.org/jira/browse/PIG-3346
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: 0.12
>
> Attachments: PIG-3346.patch
>
>
> Currently, the size of combined splits can be configured by the
> {{pig.maxCombinedSplitSize}} property.
> Although this works fine most of time, it can lead to a undesired situation
> where a single mapper ends up loading a lot of combined splits. Particularly,
> this is bad if Pig uploads them from S3.
> So it will be useful if the max number of combined splits can be configured
> via a property something like {{pig.maxCombinedSplitNum}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira