[
https://issues.apache.org/jira/browse/HDFS-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904918#comment-16904918
]
Chen Zhang commented on HDFS-13709:
-----------------------------------
Thanks [~jojochuang] for mentioning me at HDFS-14706,
This Jira and HDFS-14706 both introduce the reportBadBlock in different places,
I agree with you that we need to reuse the logic of handle bad blocks.
I've added a method \{{handleBadBlock}} in DataNode to handle bad-blocks, using
the following logic:
# If it's called by scanner, then reportBadBlock to NN at any time
# If it's the exception from other way(e.g. BlockSender), will first identify
whether it's a bad block according to the type of exception. If it's a bad
block, then try to markSuspectBlock if blockScanner is enabled, or report to NN
if scanner disabled
# I leave some specific logic in the
\{{VolumeScanner#ScanResultHandler.handle()}} method, I think they are only
related with scanner, not all situation
> Report bad block to NN when transfer block encounter EIO exception
> ------------------------------------------------------------------
>
> Key: HDFS-13709
> URL: https://issues.apache.org/jira/browse/HDFS-13709
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: Chen Zhang
> Assignee: Chen Zhang
> Priority: Major
> Attachments: HDFS-13709.002.patch, HDFS-13709.patch
>
>
> In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes
> disk bad track may cause data loss.
> For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs
> on A's replica data, and someday B and C crushed at the same time, NN will
> try to replicate data from A but failed, this block is corrupt now but no one
> knows, because NN think there is at least 1 healthy replica and it keep
> trying to replicate it.
> When reading a replica which have data on bad track, OS will return an EIO
> error, if DN reports the bad block as soon as it got an EIO, we can find
> this case ASAP and try to avoid data loss
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]