[ 
https://issues.apache.org/jira/browse/TEZ-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2244:
----------------------------------
    Attachment: TEZ-2244.2.patch


>> If I'm not mistaken - all buffers will not be of equal size. e.g. start with 
>> 2GB. First will be 25% of 2G = 500M. 2nd will be 25% of 1.5G = 375M, and so 
>> on. Is that how it's meant to work ? May be simpler to take the entire 
>> memory - and divide it equally based on the min-size.
- It would be 25% of the max allocated memory subsequently (but in case 
MIN_BLOCK_SIZE specified is greater than this limit, MIN_BLOCK_SIZE would be 
considered)


>> What's the expected behaviour when available=300, and min=200. Should this 
>> create 200 + 100 or 2 * 150 or 1*300 ?
- It would create a single block. Initially it tries to allocate 200, but since 
the rest is less than a block, it is better to add it to previous chunk.


>> The LOG line in allocate may misprepresent which buffer is being allocated 
>> since bufferIndex can be reset ?
- By the time when bufferIndex is reset, currentAvailableMemory would be zero. 
It would not allocate the space again; instead reuse the existing one.

>> It'll be good to have a way to control the 25% factor - to potentially 
>> reduce the allocations if required.
- That could be achieved with blocksize?. e.g normal scenarios min block size 
would be 2000, in which case sorter behaves like the old code.  When we need to 
control allocations, we can set min block to 500 or less to reduce the number 
of allocations.  (Or would you like to control the 25% itself, in which case 
users can accidently misconfigure it. e.g 1%).

Uploading rebased patch for master with minor modifications & added more 
testcases.

> PipelinedSorter: Progressive allocation for sort-buffers
> --------------------------------------------------------
>
>                 Key: TEZ-2244
>                 URL: https://issues.apache.org/jira/browse/TEZ-2244
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Gopal V
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2244.1.patch, TEZ-2244.2.patch, TEZ-2244.WIP.patch
>
>
> Currently, the sort buffers are allocated pessimistically for all tasks so 
> that the largest task's spill stays within memory.
> After the chained buffer implementation inside PipelinedSorter, it brings up 
> the possibility of only allocating the first chunk of the sort buffer when 
> the sorter starts up.
> This allows for the tasks which do not heavily use the sort buffer (like a 
> grouping aggregation) to use the sort-space only when the map-aggregation 
> turns itself off.
> Not reserving memory on startup hurts the worst-case scenario for the 
> pipelined sorter, but improves the average case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to