[ 
https://issues.apache.org/jira/browse/HDFS-7889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495614#comment-14495614
 ] 

Li Bo commented on HDFS-7889:
-----------------------------

hi, Zhe, please see my following explanation of the related code.

The first(leading) streamer is responsible for committing block groups. Before 
committing, the first streamer needs to wait for other streamers to finish 
writing their blocks and then count the total number of bytes written in this 
block group. Because streamers only share {{stripedBlocks}}, when an ordinary 
streamer finish writing its block, it has to report its work to leading 
streamer. It sends a LocatedBlock object(containing how many bytes it has 
written for its block) to the blocking queue of leading 
streamer(i.e.{{stripedBlocks\[0\]}}). The leading streamer will wait for the 
queue and collect other streamers' report. The ordinary streamer can just send 
an Integer to the leading streamer, here I choose LocatedBlock is because it 
may be more convenient to do error handling in HDFS-7786.

bq. hasCommittedBlock is initially false. But once becoming true, it will never 
be false again. What's the purpose of this flag?
For an ordinary streamer, it send its report to leading streamer in 
{{endBlock}} when it finishes writing a block.
For the leading streamer, at first he just request a block group from NN. When 
it has to request another block group, it has to commit the old one. So 
{{hasCommittedBlock}} will be true after the first request.

bq. Why are we always polling the first located block, instead of the i_th?
{{stripedBlocks.get(0)}} is the blocking queue of the leading streamer, it 
needs to get the results of other streamer’s work before committing the block 
group to NN.

bq. Shouldn't we always commit block.getNumBytes() * NUM_DATA_BLOCKS?
The size of last block group may be smaller than {{block.getNumBytes() * 
NUM_DATA_BLOCKS}}, {{StripedDataStreamer#countTrailingBlockGroupBytes()}} is 
used to count the written bytes of last block group. For previous full block 
group, the leading streamer has to wait for the slowest streamer to finish 
writing. Otherwise, if the leading streamer commits {{block.getNumBytes() * 
NUM_DATA_BLOCKS}} bytes to NN before slow streamers, and one streamer fails 
after that, the error handling will be complicated.

The above solution may be not the best but it works by now. If you have a 
better solution, we can discuss and optimize the related logic.


> Subclass DFSOutputStream to support writing striping layout files
> -----------------------------------------------------------------
>
>                 Key: HDFS-7889
>                 URL: https://issues.apache.org/jira/browse/HDFS-7889
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>             Fix For: HDFS-7285
>
>         Attachments: HDFS-7889-001.patch, HDFS-7889-002.patch, 
> HDFS-7889-003.patch, HDFS-7889-004.patch, HDFS-7889-005.patch, 
> HDFS-7889-006.patch, HDFS-7889-007.patch, HDFS-7889-008.patch, 
> HDFS-7889-009.patch, HDFS-7889-010.patch, HDFS-7889-011.patch, 
> HDFS-7889-012.patch, HDFS-7889-013.patch, HDFS-7889-014.patch
>
>
> After HDFS-7888, we can subclass  {{DFSOutputStream}} to support writing 
> striping layout files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to