[ 
https://issues.apache.org/jira/browse/TEZ-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2244:
----------------------------------
    Attachment: TEZ-2244.3.patch

Thanks [~gopalv]. Addressing review comments.
- Added "tez.runtime.pipelined.pre-allocate.memory" flag which would be enabled 
by default to grab all the memory allocated to sorter.
- Removed commented out code block
- Added test case in TestOrderedPartitionedKVEdgeConfig
- Added additional test case in TestPipelinedSorter.

Tested with couple of tpc-ds queries and query2b in  amplab benchmark hive 
query with different settings in multi-node cluster; works fine as expected.

> PipelinedSorter: Progressive allocation for sort-buffers
> --------------------------------------------------------
>
>                 Key: TEZ-2244
>                 URL: https://issues.apache.org/jira/browse/TEZ-2244
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Gopal V
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2244.1.patch, TEZ-2244.2.patch, TEZ-2244.3.patch, 
> TEZ-2244.WIP.patch
>
>
> Currently, the sort buffers are allocated pessimistically for all tasks so 
> that the largest task's spill stays within memory.
> After the chained buffer implementation inside PipelinedSorter, it brings up 
> the possibility of only allocating the first chunk of the sort buffer when 
> the sorter starts up.
> This allows for the tasks which do not heavily use the sort buffer (like a 
> grouping aggregation) to use the sort-space only when the map-aggregation 
> turns itself off.
> Not reserving memory on startup hurts the worst-case scenario for the 
> pipelined sorter, but improves the average case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to