[
https://issues.apache.org/jira/browse/PIG-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy updated PIG-4853:
------------------------------------
Status: Patch Available (was: Open)
> Fetch inputs before starting outputs
> ------------------------------------
>
> Key: PIG-4853
> URL: https://issues.apache.org/jira/browse/PIG-4853
> Project: Pig
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4853-1.patch
>
>
> Force fetch inputs before starting outputs so that we can choose to
> allocate more space for buffers by setting
> tez.task.scale.memory.input-output-concurrent=false which is a new option in
> Tez. With the default value of true, WeightedScalingMemoryDistributor in Tez
> for a TezConfiguration.TEZ_TASK_SCALE_MEMORY_RESERVE_FRACTION of 0.5 and 1G
> memory, will split the 512MB between inputs and outputs. If set to false, it
> will allocate 512MB to inputs and 512MB to outputs. For eg: For two join
> inputs and one group by output
> tez.task.scale.memory.input-output-concurrent=true
> {code}
> 2016-03-28 01:15:58,842 [INFO] [TezChild] |resources.MemoryDistributor|:
> Allocations=[scope-32:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:268435456:83684722],
>
> [scope-30:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:193488239],
>
> [scope-29:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:193488239]
> {code}
> tez.task.scale.memory.input-output-concurrent=false
> {code}
> 2016-03-28 01:25:36,665 [INFO] [TezChild] |resources.MemoryDistributor|:
> Allocations=[scope-32:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:268435456:268435456],
>
> [scope-29:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:235330600],
>
> [scope-30:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:235330600]
> {code}
> To ensure we don't hit OOM, we need to finish fetching the inputs by calling
> reader.next() before calling output.start(). That will make sure the input
> buffers are released before output buffers are allocated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)