[ https://issues.apache.org/jira/browse/HDFS-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104210#comment-16104210 ]
Hadoop QA commented on HDFS-10348: ---------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HDFS-10348 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-10348 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801965/HDFS-10348-1.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/20452/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Namenode report bad block method doesn't check whether the block belongs to > datanode before adding it to corrupt replicas map. > ------------------------------------------------------------------------------------------------------------------------------ > > Key: HDFS-10348 > URL: https://issues.apache.org/jira/browse/HDFS-10348 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.7.0 > Reporter: Rushabh S Shah > Assignee: Rushabh S Shah > Attachments: HDFS-10348-1.patch, HDFS-10348.patch > > > Namenode (via report bad block nethod) doesn't check whether the block > belongs to the datanode before it adds to corrupt replicas map. > In one of our cluster we found that there were 3 lingering corrupt blocks. > It happened in the following order. > 1. Two clients called getBlockLocations for a particular file. > 2. Client C1 tried to open the file and encountered checksum error from > node N3 and it reported bad block (blk1) to the namenode. > 3. Namenode added that node N3 and block blk1 to corrrupt replicas map and > ask one of the good node (one of the 2 nodes) to replicate the block to > another node N4. > 4. After receiving the block, N4 sends an IBR (with RECEIVED_BLOCK) to > namenode. > 5. Namenode removed the block and node N3 from corrupt replicas map. > It also removed N3's storage from triplets and queued an invalidate > request for N3. > 6. In the mean time, Client C2 tries to open the file and the request went to > node N3. > C2 also encountered the checksum exception and reported bad block to > namenode. > 7. Namenode added the corrupt block blk1 and node N3 to the corrupt replicas > map without confirming whether node N3 has the block or not. > After deleting the block, N3 sends an IBR (with DELETED) and the namenode > simply ignores the report since the N3's storage is no longer in the > triplets(from step 5) > We took the node out of rotation, but still the block was present only in the > corruptReplciasMap. > Since on removing the node, we only goes through the block which are present > in the triplets for a given datanode. > [~kshukla]'s patch fixed this bug via > https://issues.apache.org/jira/browse/HDFS-9958. > But I think the following check should be made in the > BlockManager#markBlockAsCorrupt instead of > BlockManager#findAndMarkBlockAsCorrupt. > {noformat} > if (storage == null) { > storage = storedBlock.findStorageInfo(node); > } > if (storage == null) { > blockLog.debug("BLOCK* findAndMarkBlockAsCorrupt: {} not found on {}", > blk, dn); > return; > } > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org