[
https://issues.apache.org/jira/browse/PIG-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865666#comment-13865666
]
Rohini Palaniswamy commented on PIG-3659:
-----------------------------------------
Current code just defaults to 1G for each vertex to get things to work.
We need to
1) Classify whether a vertex is a map or reduce and set java.opts
(mapreduce.map.java.opts or mapreduce.reduce.java.opts), memory.mb
(mapreduce.map.memory.mb or mapreduce.reduce.memory.mb) and env
(mapreduce.map.env or mapreduce.reduce.env) accordingly on the vertex. A simple
thing would be to assume all root vertexes to be map vertexes and intermediate
or leaf vertexes to be reduce vertexes.
2) Even for a map vertex, if there are multiple outputs more memory is
required as combine and sort happens on each output. Similarly on a reduce
vertex if there are multiple inputs shuffle and sort happens on each input
thus requiring more memory than the traditional map or reduce. i.e the sort
buffers (io.sort.mb) and buffer for holding each record before serializing or
deserializing them take up memory. For eg: With 3 inputs or outputs, thrice the
amount of memory is tried to be allocated for the buffers leading to OOM.
Increasing memory for a vertex based on number of inputs or outputs might not
solve the problem totally. This is something we will have to talk to Tez guys
to see how effectively this can be solved.
> Memory management for each vertex
> ---------------------------------
>
> Key: PIG-3659
> URL: https://issues.apache.org/jira/browse/PIG-3659
> Project: Pig
> Issue Type: Sub-task
> Components: tez
> Reporter: Rohini Palaniswamy
> Fix For: tez-branch
>
>
> We need to configure appropriate memory options for each vertex.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)