[ 
https://issues.apache.org/jira/browse/HADOOP-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486539
 ] 

Raghu Angadi commented on HADOOP-1134:
--------------------------------------


Proposed protocol for data transfers. Use for 'OPs' e.g OP_WRITE_BLOCK, 
OP_READ_BLOCK etc. Note that this is not a fixed font and packets might look 
skewed.

Common header for all the OPs. The requesting side send the following header.
----------------------------------------------------------------
| 2 byte version | 1 byte OP | OP specific data ... |
-----------------------------------------------------------------
Version should match exactly on both sides.

Read and write ops transfer data in DATA_CHUNKS that contain <offset, len, 
checksum, data> that Doug mentioned earlier :

DATA_CHUNK: current checksum is CRC32 and checksum will be 4 bytes.
---------------------------------------------------------------
| 4 byte Offset | 4 byte Len | data .. | checksum |
---------------------------------------------------------------
A DATA_CHUNK packet with 0 offset and 0 length indicates proper end of stream. 

When OP is OP_WRITE_BLOCK ( used when blocks are written to datanode):
==========================
-------------------------------------------------------------------------- 
| 1 byte Checksum Type | 4 byte bytes.per.checksum   | contd. 
--------------------------------------------------------------------------
--------------------------------------------------------------------------------------
 
| 4 byte num nodes to copy | DatnodeInfos ... | DATA_CHUNKS .. 
--------------------------------------------------------------------------------------

byte.per.checksum is fixed at the start and can not change. An empty DATA_CHUNK 
indicates proper end of stream.

When OP is OP_READ_BLOCK ( use to read data data from the blocks ):
===========================

------------------------------------------------------------------------ 
| 8 byte block id | 4 byte start offset | 4 byte end offset | 
-------------------------------------------------------------------------

end offset == -1 indicates till the end of the block. Reply from the data node:
-------------------------------------------------------------------------------------
 
| 1 byte checksum type | 4 byte bytes.per.checksum | DATA_CHUNKS .. 
---------------------------------------------------------------------------------------
 


 







> Block level CRCs in HDFS
> ------------------------
>
>                 Key: HADOOP-1134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1134
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Raghu Angadi
>         Assigned To: Raghu Angadi
>
> Currently CRCs are handled at FileSystem level and are transparent to core 
> HDFS. See recent improvement HADOOP-928 ( that can add checksums to a given 
> filesystem ) regd more about it. Though this served us well there a few 
> disadvantages :
> 1) This doubles namespace in HDFS ( or other filesystem implementations ). In 
> many cases, it nearly doubles the number of blocks. Taking namenode out of 
> CRCs would nearly double namespace performance both in terms of CPU and 
> memory.
> 2) Since CRCs are transparent to HDFS, it can not actively detect corrupted 
> blocks. With block level CRCs, Datanode can periodically verify the checksums 
> and report corruptions to namnode such that name replicas can be created.
> We propose to have CRCs maintained for all HDFS data in much the same way as 
> in GFS. I will update the jira with detailed requirements and design. This 
> will include same guarantees provided by current implementation and will 
> include a upgrade of current data.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to