[ 
https://issues.apache.org/jira/browse/TEZ-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2085:
----------------------------------
    Attachment: TEZ-2085.4.patch

Addressing review comments of gopalv
- Added final keywords to capacity, blockSize
- Renamed var to bufferOverflowRecursion & changed comments 

Addressing review comments of Hitesh
- Block size is capped to a maximum of 2 GB per block and sorter can have more 
blocks in case of > 2GB container
- Precondition check was there at the start with range bound checks. Moved 
blocksize computation to separate method for testing.
- Removing the additional config parameter which was introduced in earlier 
patch.  In case of > 2GB sort buffer, we can compute the block size to be 2 GB. 
This need not be visible to the user. If anyone tries to add a KV (of size > 2 
GB), it would end up throwing buffer overflow exceptions.  This is common for 
DefaultSorter as well.
- Added couple of more tests for the above changes.

> PipelinedSorter should bail out (on BufferOverflowException) instead of 
> retrying continuously
> ---------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2085
>                 URL: https://issues.apache.org/jira/browse/TEZ-2085
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2085.1.patch, TEZ-2085.2.patch, TEZ-2085.3.patch, 
> TEZ-2085.4.patch
>
>
> If we try to fit in a key/value pair which is great than the size that sort 
> span can accommodate, PipelinedSorter would try to sort/spill indefinitely.  
> This is more of a corner case.  It should bail out gracefully and can throw 
> back IOException instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to