[
https://issues.apache.org/jira/browse/TEZ-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-2085:
----------------------------------
Attachment: TEZ-2085.4.patch
Addressing review comments of gopalv
- Added final keywords to capacity, blockSize
- Renamed var to bufferOverflowRecursion & changed comments
Addressing review comments of Hitesh
- Block size is capped to a maximum of 2 GB per block and sorter can have more
blocks in case of > 2GB container
- Precondition check was there at the start with range bound checks. Moved
blocksize computation to separate method for testing.
- Removing the additional config parameter which was introduced in earlier
patch. In case of > 2GB sort buffer, we can compute the block size to be 2 GB.
This need not be visible to the user. If anyone tries to add a KV (of size > 2
GB), it would end up throwing buffer overflow exceptions. This is common for
DefaultSorter as well.
- Added couple of more tests for the above changes.
> PipelinedSorter should bail out (on BufferOverflowException) instead of
> retrying continuously
> ---------------------------------------------------------------------------------------------
>
> Key: TEZ-2085
> URL: https://issues.apache.org/jira/browse/TEZ-2085
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2085.1.patch, TEZ-2085.2.patch, TEZ-2085.3.patch,
> TEZ-2085.4.patch
>
>
> If we try to fit in a key/value pair which is great than the size that sort
> span can accommodate, PipelinedSorter would try to sort/spill indefinitely.
> This is more of a corner case. It should bail out gracefully and can throw
> back IOException instead.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)