[ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538891 ]
rangadi edited comment on HADOOP-2012 at 10/30/07 12:20 PM: ----------------------------------------------------------------- > The metadata about the entire Datanode is stored in the VERSION file. It is > possible that we can store the last verified-blockid in this file (instead of > adding a new file). VERSION file contains only the basic information needed for starting up a Datanode and is vital for Datanode startup. It is not updated at runtime. I don't think it is suited for for this. Regd continuous scanning, I think most users would not prefer that. Even 5MB per sec is close to 20% of a single disk read and much more if the read/write is not very sequential (happens when there are multiple reads and writes). We can certainly make the SCAN_PERIOD and throttle bandwidth configurable (may be not in hadoop-defaults.xml), where power users can tweak it as appropriate. I know there is strong resistance for adding any config variables :). But my personal opinion is that some more config vars that 99% of users don't need to worry about because of good defaults are ok. was (Author: rangadi): > The metadata about the entire Datanode is stored in the VERSION file. It is possible that we can store the last verified-blockid in this file (instead of adding a new file). VERSION file contains only the basic information needed for starting up a Datanode and is vital for Datanode startup. It is not updated at runtime. I don't think it is suited for for this. Regd continuous scanning, I think most users would not prefer that. Even 5MB per sec is close to 20% of a single disk read and much more if the read/write is not very sequential (happens when there are multiple reads and writes). We can certainly make the SCAN_PERIOD and throttle bandwidth configurable (may be not in hadoop-defaults.xml), where power users can tweak it as appropriate. I know there is strong resistance for adding any config variables :). But I personal opinion is that some more config's that 99% of users don't need to worry about because of good defaults are ok. > Periodic verification at the Datanode > ------------------------------------- > > Key: HADOOP-2012 > URL: https://issues.apache.org/jira/browse/HADOOP-2012 > Project: Hadoop > Issue Type: New Feature > Components: dfs > Reporter: Raghu Angadi > Assignee: Raghu Angadi > Fix For: 0.16.0 > > Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, > HADOOP-2012.patch > > > Currently on-disk data corruption on data blocks is detected only when it is > read by the client or by another datanode. These errors are detected much > earlier if datanode can periodically verify the data checksums for the local > blocks. > Some of the issues to consider : > - How should we check the blocks ( no more often than once every couple of > weeks ?) > - How do we keep track of when a block was last verfied ( there is a .meta > file associcated with each lock ). > - What action to take once a corruption is detected > - Scanning should be done as a very low priority with rest of the datanode > disk traffic in mind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.