[jira] [Created] (PIG-4853) Fetch inputs before starting outputs

Rohini Palaniswamy (JIRA) Mon, 28 Mar 2016 11:47:38 -0700

Rohini Palaniswamy created PIG-4853:
---------------------------------------


             Summary: Fetch inputs before starting outputs
                 Key: PIG-4853
                 URL: https://issues.apache.org/jira/browse/PIG-4853
             Project: Pig
          Issue Type: Improvement
            Reporter: Rohini Palaniswamy
            Assignee: Rohini Palaniswamy
             Fix For: 0.16.0


    Force fetch inputs before starting outputs so that we can choose to 
allocate more space for buffers by setting 
tez.task.scale.memory.input-output-concurrent=false which is a new option in 
Tez. With the default value of true, WeightedScalingMemoryDistributor in Tez 
for a TezConfiguration.TEZ_TASK_SCALE_MEMORY_RESERVE_FRACTION of 0.5 and 1G 
memory, will split the 512MB between inputs and outputs. If set to false, it 
will allocate 512MB to inputs and 512MB to outputs.  For eg: For two join 
inputs and one group by output

tez.task.scale.memory.input-output-concurrent=true
{code}
2016-03-28 01:15:58,842 [INFO] [TezChild] |resources.MemoryDistributor|: 
Allocations=[scope-32:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:268435456:83684722],
 
[scope-30:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:193488239],
 
[scope-29:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:193488239]
{code}

tez.task.scale.memory.input-output-concurrent=false
{code}
2016-03-28 01:25:36,665 [INFO] [TezChild] |resources.MemoryDistributor|: 
Allocations=[scope-32:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:268435456:268435456],
 
[scope-29:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:235330600],
 
[scope-30:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:235330600]
{code}

To ensure we don't hit OOM, we need to finish fetching the inputs by calling 
reader.next() before calling output.start(). That will make sure the input 
buffers are released before output buffers are allocated. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (PIG-4853) Fetch inputs before starting outputs

Reply via email to