[ 
https://issues.apache.org/jira/browse/TEZ-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978791#comment-14978791
 ] 

Gopal V commented on TEZ-2244:
------------------------------

No, it doesn't. 

The existing behaviour is as follows - the system will not allocate a block 
which is larger than 2000Mb, but only allocate upto the total 
{{tez.runtime.io.sort.mb}} specified and it will allocate all the memory 
eagerly to reserve it all for the sort buffer.

In the traditional MR implementation that makes sense as there's no container 
reuse and allocation will always happen without any GC triggered by the 
reservation of memory.

The patch moves to a more lazy approach instead of eagerly triggering GCs.

> PipelinedSorter: Progressive allocation for sort-buffers
> --------------------------------------------------------
>
>                 Key: TEZ-2244
>                 URL: https://issues.apache.org/jira/browse/TEZ-2244
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Gopal V
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2244.1.patch, TEZ-2244.2.patch, TEZ-2244.3.patch, 
> TEZ-2244.4.patch, TEZ-2244.5.patch, TEZ-2244.6.patch, TEZ-2244.7.patch, 
> TEZ-2244.WIP.patch
>
>
> Currently, the sort buffers are allocated pessimistically for all tasks so 
> that the largest task's spill stays within memory.
> After the chained buffer implementation inside PipelinedSorter, it brings up 
> the possibility of only allocating the first chunk of the sort buffer when 
> the sorter starts up.
> This allows for the tasks which do not heavily use the sort buffer (like a 
> grouping aggregation) to use the sort-space only when the map-aggregation 
> turns itself off.
> Not reserving memory on startup hurts the worst-case scenario for the 
> pipelined sorter, but improves the average case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to