[ 
https://issues.apache.org/jira/browse/HADOOP-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484996
 ] 

Raghu Angadi commented on HADOOP-1134:
--------------------------------------

Sameer, does the following sum up your proposal (a refinement of option (3) 
above) :

1) For each blocks for which CRC is generated (in one of the ways mentioned 
below), Datanode reports CRC of the checksum file to namenode.

2) First a datanode checks its block data with CRC. 

3) If CRC check fails, it verifies with different block of the old CRC data.

4) If check with old CRC matches, it will be reported to Namenode as 
authoritative checksum of the CRC file to Namenode.

5) If this fails as well, it will generate CRC based on local CRC data and 
report the checksum of the CRC file to Namenode.

6) For blocks that go through step (5) it periodically checks with Namenode to 
see whether its CRC file matches either with authoritative CRC or majority. If 
answer is yes, then its CRC is considered valid. If answer is no, then the 
block is scheduled to be deleted (note an authoritative or majority CRC already 
exists).

7) If namenode cannot say yes or no for some reason, local CRC is kept with a 
warning.

8) If the block cannot even be read properly or has incorrect file length for 
some reason, it is considered not to exist (could be marked for deletion).

Note that since Namenode needs to track extra information, its memory footprint 
will be larger than pre-upgrade. If we want to avoid this, we could do the 
authoritative/majority  check with the one of the datanodes (lexically first 
node) for each replica.

To reduce the wait for matching with the authoritative or majority copy, each 
datanode will sort the their blocks and upgrade them in order.
Once an authoritative match is found for a block, namenode does need to track 
the meta-crc (crc of the checksum file) from each datanode.


> Block level CRCs in HDFS
> ------------------------
>
>                 Key: HADOOP-1134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1134
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Raghu Angadi
>         Assigned To: Raghu Angadi
>
> Currently CRCs are handled at FileSystem level and are transparent to core 
> HDFS. See recent improvement HADOOP-928 ( that can add checksums to a given 
> filesystem ) regd more about it. Though this served us well there a few 
> disadvantages :
> 1) This doubles namespace in HDFS ( or other filesystem implementations ). In 
> many cases, it nearly doubles the number of blocks. Taking namenode out of 
> CRCs would nearly double namespace performance both in terms of CPU and 
> memory.
> 2) Since CRCs are transparent to HDFS, it can not actively detect corrupted 
> blocks. With block level CRCs, Datanode can periodically verify the checksums 
> and report corruptions to namnode such that name replicas can be created.
> We propose to have CRCs maintained for all HDFS data in much the same way as 
> in GFS. I will update the jira with detailed requirements and design. This 
> will include same guarantees provided by current implementation and will 
> include a upgrade of current data.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to