[
https://issues.apache.org/jira/browse/HDFS-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522450#comment-14522450
]
Tsz Wo Nicholas Sze commented on HDFS-8299:
-------------------------------------------
{code}
2015-04-30 14:11:08,235 WARN datanode.DataNode
(DataNode.java:checkStorageLocations(2284)) - Invalid dfs.datanode.data.dir
/archive1/dn :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not
writable: /archive1/dn
{code}
Since datanode dir was considered as invalid, the datanode did not add the dir
to its block map. All the block under that dir won't be report to NN.
> HDFS reporting missing blocks when they are actually present due to read-only
> filesystem
> ----------------------------------------------------------------------------------------
>
> Key: HDFS-8299
> URL: https://issues.apache.org/jira/browse/HDFS-8299
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.6.0
> Environment: HDP 2.2
> Reporter: Hari Sekhon
> Priority: Critical
> Attachments: datanode.log
>
>
> Fsck shows missing blocks when the blocks can be found on a datanode's
> filesystem and the datanode has been restarted to try to get it to recognize
> that the blocks are indeed present and hence report them to the NameNode in a
> block report.
> Fsck output showing an example "missing" block:
> {code}/apps/hive/warehouse/<custom_scrubbed>.db/someTable/000000_0: CORRUPT
> blockpool BP-120244285-<ip>-1417023863606 block blk_1075202330
> MISSING 1 blocks of total size 3260848 B
> 0. BP-120244285-<ip>-1417023863606:blk_1075202330_1484191 len=3260848
> MISSING!{code}
> The block is definitely present on more than one datanode however, here is
> the output from one of them that I restarted to try to get it to report the
> block to the NameNode:
> {code}# ll
> /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330*
> -rw-r--r-- 1 hdfs 499 3260848 Apr 27 15:02
> /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330
> -rw-r--r-- 1 hdfs 499 25483 Apr 27 15:02
> /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330_1484191.meta{code}
> It's worth noting that this is on HDFS tiered storage on an archive tier
> going to a networked block device that may have become temporarily
> unavailable but is available now. See also feature request HDFS-8297 for
> online rescan to not have to go around restarting datanodes.
> It turns out in the datanode log (that I am attaching) this is because the
> datanode fails to get a write lock on the filesystem. I think it would be
> better to be able to read-only those blocks however, since this way causes
> client visible data unavailability when the data could in fact be read.
> {code}2015-04-30 14:11:08,235 WARN datanode.DataNode
> (DataNode.java:checkStorageLocations(2284)) - Invalid dfs.datanode.data.dir
> /archive1/dn :
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not
> writable: /archive1/dn
> at
> org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:193)
> at
> org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:157)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2239)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2281)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2263)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2155)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378)
> at
> org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:78)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)