Rohini Palaniswamy created PIG-4853:
---------------------------------------
Summary: Fetch inputs before starting outputs
Key: PIG-4853
URL: https://issues.apache.org/jira/browse/PIG-4853
Project: Pig
Issue Type: Improvement
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
Fix For: 0.16.0
Force fetch inputs before starting outputs so that we can choose to
allocate more space for buffers by setting
tez.task.scale.memory.input-output-concurrent=false which is a new option in
Tez. With the default value of true, WeightedScalingMemoryDistributor in Tez
for a TezConfiguration.TEZ_TASK_SCALE_MEMORY_RESERVE_FRACTION of 0.5 and 1G
memory, will split the 512MB between inputs and outputs. If set to false, it
will allocate 512MB to inputs and 512MB to outputs. For eg: For two join
inputs and one group by output
tez.task.scale.memory.input-output-concurrent=true
{code}
2016-03-28 01:15:58,842 [INFO] [TezChild] |resources.MemoryDistributor|:
Allocations=[scope-32:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:268435456:83684722],
[scope-30:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:193488239],
[scope-29:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:193488239]
{code}
tez.task.scale.memory.input-output-concurrent=false
{code}
2016-03-28 01:25:36,665 [INFO] [TezChild] |resources.MemoryDistributor|:
Allocations=[scope-32:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:268435456:268435456],
[scope-29:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:235330600],
[scope-30:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:235330600]
{code}
To ensure we don't hit OOM, we need to finish fetching the inputs by calling
reader.next() before calling output.start(). That will make sure the input
buffers are released before output buffers are allocated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)