[ 
https://issues.apache.org/jira/browse/HBASE-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842343#action_12842343
 ] 

Nicolas Spiegelberg commented on HBASE-2234:
--------------------------------------------

Go ahead and apply the existing patch with the comment changes.

I spent a little bit of time yesterday trying to understand all the layers of 
buffering between the SequenceFile.Writer & actually having the pipeline opened 
and content sent to the datanodes.  I figured I'd pass that information along 
since 0.20.2 currently does not support syncFs().  Without syncFs, the pipeline 
seems to be created every 64k, which is 'dfs.write.packet.size'.  The stack 
trace, with associated buffering, that I was following:

1. SequenceFile.Writer.append() 
2. FSOutputSummer.write()              --> buffers to maxChunkSize. An HDFS 
chunk is the amount of data in between checksums. (default: 512bytes)
3. FSOutputSummer.flushBuffer() 
4. FSOutputSummer. writeChecksumChunk() 
5. DFSOutputStream.writeChunk()  --> buffers to currentPacket.maxChunk.  This 
is the maximum HDFS chunk count that can be place in a Packet.  Approx byte 
count is min("dfs.block.size" (default:64MB), 
"hbase.regionserver.hlog.blocksize" (default:"dfs.block.size"), 
"dfs.write.packet.size" (default:64k))
5. DataStreamer.run() <-- creates the pipeline 


> Roll Hlog if any datanode in the write pipeline dies
> ----------------------------------------------------
>
>                 Key: HBASE-2234
>                 URL: https://issues.apache.org/jira/browse/HBASE-2234
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: Nicolas Spiegelberg
>            Priority: Blocker
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: HBASE-2234-20.4-1.patch, HBASE-2234-20.4.patch
>
>
> HDFS does not replicate the last block of a file that is being written to. 
> This means that is datanodes in the write pipeline die, then the data blocks 
> in the transaction log would be experiencing reduced redundancy. It would be 
> good if the region server can detect datanode-death in the write pipeline 
> while writing to the transaction log and if this happens, close the current 
> log an open a new one. This depends on HDFS-826

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to