[ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghu Angadi updated HADOOP-2012: --------------------------------- Attachment: HADOOP-2012.patch The latest patch, for WIW is attached. It does not following : - verifies blocks and adjusts the read bandwidth between 1-8 MBps in order to complete the verification in a given period. The period is configurable. Its progress can be tracked on the page '/blockScannerReport'. - When a client reads a complete block and checksum succeeds, informs the datanode. Datanode considers that as verification of the data. - The last verification time is stored in block metadata file (this file contains CRC). This itself includes quite a few changes : -- {{BlockMetadata.java}} handles metadata related operations : reading and writing headers, handling versions, upgrading to new versions etc. -- It is now simpler to add new fields to metadata. -- It handles file modifications when the file is linked from backup (after an upgrade) (not yet not Windows). I try to make another version of the patch where last modification is stored in a separate text file. > Periodic verification at the Datanode > ------------------------------------- > > Key: HADOOP-2012 > URL: https://issues.apache.org/jira/browse/HADOOP-2012 > Project: Hadoop > Issue Type: New Feature > Components: dfs > Reporter: Raghu Angadi > Assignee: Raghu Angadi > Fix For: 0.16.0 > > Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, > HADOOP-2012.patch, HADOOP-2012.patch > > > Currently on-disk data corruption on data blocks is detected only when it is > read by the client or by another datanode. These errors are detected much > earlier if datanode can periodically verify the data checksums for the local > blocks. > Some of the issues to consider : > - How should we check the blocks ( no more often than once every couple of > weeks ?) > - How do we keep track of when a block was last verfied ( there is a .meta > file associcated with each lock ). > - What action to take once a corruption is detected > - Scanning should be done as a very low priority with rest of the datanode > disk traffic in mind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.