[
https://issues.apache.org/jira/browse/PIG-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213708#comment-15213708
]
Rohini Palaniswamy commented on PIG-4844:
-----------------------------------------
Changes done:
- pig.pigContext takes up lot of space in the payload as it contains all the
config. Only ship what is necessary (local mode and log4j properties)
- Auto increase AM memory if number of vertices is > 30 or number of outputs
per vertex is > 10.
- Force fetch inputs before starting outputs so that we can choose to
allocate more space for buffers by setting
tez.task.scale.memory.input-output-concurrent=false which is a new option in
Tez.
> Tez AM runs out of memory when vertex has high number of outputs
> ----------------------------------------------------------------
>
> Key: PIG-4844
> URL: https://issues.apache.org/jira/browse/PIG-4844
> Project: Pig
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
> Fix For: 0.16.0
>
>
> AM runs out of memory when trying to respond to getTask() calls from
> container for a vertex with large number of outputs (usually the case with
> multi-query when you group by on multiple dimensions). Problem is with the
> size of payload config associated with PigProcessor, Input and Output. When
> there is >10 outputs size of the payload considerably increases causing
> memory pressure.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)