[
https://issues.apache.org/jira/browse/PIG-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091353#comment-15091353
]
Daniel Dai commented on PIG-4775:
---------------------------------
I mean the default 128MB. Anyway, this is not related to the patch.
> Better default values for shuffle bytes per reducer
> ---------------------------------------------------
>
> Key: PIG-4775
> URL: https://issues.apache.org/jira/browse/PIG-4775
> Project: Pig
> Issue Type: Bug
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4775-1.patch, PIG-4775-2.patch
>
>
> Currently the code does not set
> TEZ_SHUFFLE_VERTEX_MANAGER_DESIRED_TASK_INPUT_SIZE if BYTES_PER_REDUCER_PARAM
> is not set or equal to DEFAULT_BYTES_PER_REDUCER (1G). Which makes it default
> to TEZ_SHUFFLE_VERTEX_MANAGER_DESIRED_TASK_INPUT_SIZE_DEFAULT =
> 1024*1024*100L (100MB) which is low and can cause to produce more output
> files than usual. Removing that check and defaulting to 1G would be bad for
> performance as in case of mapreduce that was based as map input size, but in
> Tez it is taken as map output size. So setting 384MB as default for group by
> as they usually reduce size of data output and keeping 256MB for joins as
> they increase size of output data.
> Did not touch order by and skewed join as DEFAULT_BYTES_PER_REDUCER of 1G is
> honored there. Using 1G for them would be similar to mapreduce, as map input
> and output would be same for those cases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)