[
https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319015#comment-14319015
]
Colin Patrick McCabe commented on HDFS-7648:
--------------------------------------------
Hi [~szetszwo], since you suggested splitting this JIRA into two, I had assumed
that you wanted to have the discussion about "automatic fixing" on the second
JIRA. However if you want to have it now, I'll share my thoughts.
As I stated earlier, I don't think we should do automatic fixing. We simply
don't know *why* the DataNode got into a state where the directory layout is
wrong. This is similar to "what happens if there is no VERSION file?" We
don't try to automatically fix this. If there is no VERSION file, then it's
very likely that there is a serious misconfiguration and/or filesystem bug, and
our attempts to fix it would only make things worse.
The same logic applies here. If there are blocks in the wrong location, why is
that happening? It could be because there is a serious bug in the software.
In that case, deleting the blocks, as you have suggested, would only lead to
data loss. It could be because the sysadmin manually edited a {{VERSION}} file
for an old (pre HDFS-6482) datanode directory to look like it was
post-HDFS-6482, bypassing the upgrade process. In this case, deleting *all*
the data is still the wrong thing to do... the sysadmin should instead see logs
telling him that this configuration is wrong. Finally, blocks could be in the
wrong place because there is a serious disk drive or local FS error. In this
case, deletion will still do no good, because the device is in a seriously
unusable state.
I'd also like to note that we've spent quite a lot of time discussing
theoretical failures that may or may not ever happen. Who knows whether we
actually will ever find blocks in the wrong place? You are asking for
automatic handling of something that, to our knowledge, has never even happened
once. That seems like putting the cart before the horse.
> Verify the datanode directory layout
> ------------------------------------
>
> Key: HDFS-7648
> URL: https://issues.apache.org/jira/browse/HDFS-7648
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Rakesh R
> Attachments: HDFS-7648.patch, HDFS-7648.patch
>
>
> HDFS-6482 changed datanode layout to use block ID to determine the directory
> to store the block. We should have some mechanism to verify it. Either
> DirectoryScanner or block report generation could do the check.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)