[
https://issues.apache.org/jira/browse/TEZ-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510581#comment-14510581
]
Rajesh Balamohan edited comment on TEZ-2256 at 4/29/15 11:23 PM:
-----------------------------------------------------------------
[~cchepelov] Thanks for rebasing the patch to master. +1.
DefaultSorter and PipelinedSorter (especially) has different codepaths.
DefaultSorter is more or less borrowed from MR world, and pipelinedsorter is
better than DefaultSorter. With master, pipelinedsorter would have more changes
as some memory optimizations have gone in to reduce key comparison costs,
ability to support larger memory and ability to support initial work on
pipelined shuffle.
For checking in, we can possibly commit the current patch to master &
branch-0.6; And consider the ordered-case fixes as follow up JIRA to this as
well.
was (Author: rajesh.balamohan):
[~cyrille] Thanks for rebasing the patch to master. +1.
DefaultSorter and PipelinedSorter (especially) has different codepaths.
DefaultSorter is more or less borrowed from MR world, and pipelinedsorter is
better than DefaultSorter. With master, pipelinedsorter would have more changes
as some memory optimizations have gone in to reduce key comparison costs,
ability to support larger memory and ability to support initial work on
pipelined shuffle.
For checking in, we can possibly commit the current patch to master &
branch-0.6; And consider the ordered-case fixes as follow up JIRA to this as
well.
> Avoid use of BufferTooSmallException to signal end of buffer in
> UnorderedPartitionedKVWriter
> --------------------------------------------------------------------------------------------
>
> Key: TEZ-2256
> URL: https://issues.apache.org/jira/browse/TEZ-2256
> Project: Apache Tez
> Issue Type: Improvement
> Affects Versions: 0.6.0, 0.7.0
> Reporter: Cyrille Chépélov
> Assignee: Cyrille Chépélov
> Priority: Critical
> Labels: patch
> Attachments: remove-btse-1-MASTER.patch, remove-btse-1-rfc.patch
>
> Original Estimate: 6h
> Remaining Estimate: 6h
>
> UnorderedPartitionedKVWriter delegates serialization to the application,
> passing it a private ByteArrayOutputStream. In case the buffer is exhausted,
> ByteArrayOutputStream signals that with a private BufferTooSmallException,
> which can be seen but not dealt with by the application. As [~cwensel]
> pointed out, when the application is in fact a complex framework, there is no
> way to distinguish this exception from a real failure, which compels logging
> the full stack even for reasonable events such as "buffer complete".
> Suggested approach: set a "complete" flag in ByteArrayOutputStream that
> disables any further output, and replace BufferTooSmallException (BTSE)
> handling by checking that flag.
> [~sseth] suggested checking out SortedOutput as well, as the mechanisms there
> should be similar.
> I'll give this a go this week.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)