[jira] [Commented] (TEZ-2244) PipelinedSorter: Progressive allocation for sort-buffers

Rajesh Balamohan (JIRA) Wed, 28 Oct 2015 10:24:12 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978808#comment-14978808
 ]


Rajesh Balamohan commented on TEZ-2244:
---------------------------------------

[~hitesh] - It depends on the sort buffers. So if > 2GB is set, it would try to 
create multiple 2000MB chunks & in case lesser sort buffer is configured, it 
would end up using a single block (within the limit specified in sort buffer 
limit). This is to support the existing default model.

> PipelinedSorter: Progressive allocation for sort-buffers
> --------------------------------------------------------
>
>                 Key: TEZ-2244
>                 URL: https://issues.apache.org/jira/browse/TEZ-2244
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Gopal V
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2244.1.patch, TEZ-2244.2.patch, TEZ-2244.3.patch, 
> TEZ-2244.4.patch, TEZ-2244.5.patch, TEZ-2244.6.patch, TEZ-2244.7.patch, 
> TEZ-2244.WIP.patch
>
>
> Currently, the sort buffers are allocated pessimistically for all tasks so 
> that the largest task's spill stays within memory.
> After the chained buffer implementation inside PipelinedSorter, it brings up 
> the possibility of only allocating the first chunk of the sort buffer when 
> the sorter starts up.
> This allows for the tasks which do not heavily use the sort buffer (like a 
> grouping aggregation) to use the sort-space only when the map-aggregation 
> turns itself off.
> Not reserving memory on startup hurts the worst-case scenario for the 
> pipelined sorter, but improves the average case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2244) PipelinedSorter: Progressive allocation for sort-buffers

Reply via email to