[
https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538891
]
rangadi edited comment on HADOOP-2012 at 10/30/07 12:20 PM:
-----------------------------------------------------------------
> The metadata about the entire Datanode is stored in the VERSION file. It is
> possible that we can store the last verified-blockid in this file (instead of
> adding a new file).
VERSION file contains only the basic information needed for starting up a
Datanode and is vital for Datanode startup. It is not updated at runtime. I
don't think it is suited for for this.
Regd continuous scanning, I think most users would not prefer that. Even 5MB
per sec is close to 20% of a single disk read and much more if the read/write
is not very sequential (happens when there are multiple reads and writes).
We can certainly make the SCAN_PERIOD and throttle bandwidth configurable (may
be not in hadoop-defaults.xml), where power users can tweak it as appropriate.
I know there is strong resistance for adding any config variables :). But my
personal opinion is that some more config vars that 99% of users don't need to
worry about because of good defaults are ok.
was (Author: rangadi):
> The metadata about the entire Datanode is stored in the VERSION file. It
is possible that we can store the last verified-blockid in this file (instead
of adding a new file).
VERSION file contains only the basic information needed for starting up a
Datanode and is vital for Datanode startup. It is not updated at runtime. I
don't think it is suited for for this.
Regd continuous scanning, I think most users would not prefer that. Even 5MB
per sec is close to 20% of a single disk read and much more if the read/write
is not very sequential (happens when there are multiple reads and writes).
We can certainly make the SCAN_PERIOD and throttle bandwidth configurable (may
be not in hadoop-defaults.xml), where power users can tweak it as appropriate.
I know there is strong resistance for adding any config variables :). But I
personal opinion is that some more config's that 99% of users don't need to
worry about because of good defaults are ok.
> Periodic verification at the Datanode
> -------------------------------------
>
> Key: HADOOP-2012
> URL: https://issues.apache.org/jira/browse/HADOOP-2012
> Project: Hadoop
> Issue Type: New Feature
> Components: dfs
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.16.0
>
> Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch,
> HADOOP-2012.patch
>
>
> Currently on-disk data corruption on data blocks is detected only when it is
> read by the client or by another datanode. These errors are detected much
> earlier if datanode can periodically verify the data checksums for the local
> blocks.
> Some of the issues to consider :
> - How should we check the blocks ( no more often than once every couple of
> weeks ?)
> - How do we keep track of when a block was last verfied ( there is a .meta
> file associcated with each lock ).
> - What action to take once a corruption is detected
> - Scanning should be done as a very low priority with rest of the datanode
> disk traffic in mind.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.