haiyang1987 opened a new pull request, #6476:
URL: https://github.com/apache/hadoop/pull/6476

   ### Description of PR
   https://issues.apache.org/jira/browse/HDFS-17346
   
   DirectoryScanner check mark the normal blocks as corrupt and report to 
namenode, it maybe cause some corrupted blocks, actually these are health.
   
   This can happen if Appending and DirectoryScanner are running at the same 
time, and the probability is very high.
   
   **Root cause:**
   
   - Create a file such as:blk_xxx_1001 and diskFile is 
"file:/XXX/current/finalized/blk_xxx", diskMetaFile is 
"file:/XXX/current/finalized/blk_xxx_1001.meta"
   
   - Run DirectoryScanner, first will create BlockPoolReport.ScanInfo and 
record blockFile is "file:/XXX/current/finalized/blk_xxx" and metaFile is 
"file:/XXX/current/finalized/blk_xxx_1001.meta"
   
   - Simultaneously other thread to complete append for blk_xxx, then the 
diskFile "file:/XXX/current/finalized/blk_xxx", diskMetaFile 
"file:/XXX/current/finalized/blk_xxx_1002.meta", memMetaFile 
"file:/XXX/current/finalized/blk_xxx", memDataFile 
"file:/XXX/current/finalized/blk_xxx_1002.meta"
   
   - DirectoryScanner continue to run, due to the different generation stamps 
of the metadata file in mem and metadata file in scanInfo will add the scanInfo 
object to the list of differences
   
   - Continue to run FsDatasetImpl#checkAndUpdate will traverse the list of 
differences, due to current diskMetaFile 
"/XXX/current/finalized/blk_xxx_1001.meta" is not exists, so isRegular as false
   ```
    final boolean isRegular = FileUtil.isRegularFile(diskMetaFile, false) && 
FileUtil.isRegularFile(diskFile, false);
   ```
   - Here will mark the normal blocks as corrupt and report to namenode
   ```
        } else if (!isRegular) {
            corruptBlock = new Block(memBlockInfo);
           LOG.warn("Block:{} is not a regular file.", 
corruptBlock.getBlockId());
        }
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to