[
https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538553
]
dhruba borthakur commented on HADOOP-2012:
------------------------------------------
The metadata about the entire Datanode is stored in the VERSION file. It is
possible that we can store the last verified-blockid in this file (instead of
adding a new file).
Do we really need a scan period? Your proposal that the Datanode spends a
certain percentage of the disk bandwidth to verify blocks sounds effective by
itself. If a datanode has 100K blocks each of 128MB each, and it is configured
to use 5MB/sec disk bandwidth for verification, it would take the Datanoed
about 4 days to verify each and every block it has in the system. The next
iteration could start immediately. If a datanode has few blocks, each iteration
would finish quickly and the nect iteration would start immediately. Is there a
dis-advantage in starting iterations back-to-back? We can get away by not
having another configuration parameter.
> Periodic verification at the Datanode
> -------------------------------------
>
> Key: HADOOP-2012
> URL: https://issues.apache.org/jira/browse/HADOOP-2012
> Project: Hadoop
> Issue Type: New Feature
> Components: dfs
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.16.0
>
> Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch,
> HADOOP-2012.patch
>
>
> Currently on-disk data corruption on data blocks is detected only when it is
> read by the client or by another datanode. These errors are detected much
> earlier if datanode can periodically verify the data checksums for the local
> blocks.
> Some of the issues to consider :
> - How should we check the blocks ( no more often than once every couple of
> weeks ?)
> - How do we keep track of when a block was last verfied ( there is a .meta
> file associcated with each lock ).
> - What action to take once a corruption is detected
> - Scanning should be done as a very low priority with rest of the datanode
> disk traffic in mind.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.