[ 
https://issues.apache.org/jira/browse/TEZ-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485286#comment-14485286
 ] 

Cyrille Chépélov commented on TEZ-2256:
---------------------------------------

[~rbalamohan] I think the removal is complete as far as 
UnorderedPartitionedKVWriter is concerned. 

However, there is the Sorted case. Looking at the code in both branch-0.6 and 
master:
* actual buffer action happens in PipelinedSorter and DefaultSorter in the 
OrderedPartitionedKVOutput case, UnorderedPartitionedKVWriter in the 
UnorderedPartitionedKVOutput case.
* UnorderedPartitionedKVWriter writes into a ByteArrayOutputStream <: 
OutputStream, which works on a WrappedBuffer, _and before these patches, used 
to throw BufferTooSmallException_
* DefaultSorter writes into a private Buffer <: OutputStream, which 
occasionally throws MapBufferTooSmallException in case a spill is required
* PipelinedSorter writes into a java.nio.ByteBuffer, which occasionally throws 
java.nio.BufferOverflowException through the key/value serializers (the details 
seem to have changed between 0.6 and master, but this holds)

Is there a reason why these three code paths use apparently very different 
infrastructure?


> Avoid use of BufferTooSmallException to signal end of buffer in 
> UnorderedPartitionedKVWriter
> --------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2256
>                 URL: https://issues.apache.org/jira/browse/TEZ-2256
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Cyrille Chépélov
>            Assignee: Cyrille Chépélov
>            Priority: Minor
>              Labels: patch
>         Attachments: remove-btse-1-MASTER.patch, remove-btse-1-rfc.patch
>
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> UnorderedPartitionedKVWriter delegates serialization to the application, 
> passing it a private ByteArrayOutputStream. In case the buffer is exhausted, 
> ByteArrayOutputStream signals that with a private BufferTooSmallException, 
> which can be seen but not dealt with by the application. As [~cwensel] 
> pointed out, when the application is in fact a complex framework, there is no 
> way to distinguish this exception from a real failure, which compels logging 
> the full stack even for reasonable events such as "buffer complete".
> Suggested approach: set a "complete" flag in ByteArrayOutputStream that 
> disables any further output, and replace  BufferTooSmallException (BTSE) 
> handling by checking that flag. 
> [~sseth] suggested checking out SortedOutput as well, as the mechanisms there 
> should be similar.
> I'll give this a go this week.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to