[
https://issues.apache.org/jira/browse/HDFS-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522917#comment-14522917
]
Hari Sekhon commented on HDFS-8299:
-----------------------------------
To clarify, a read-only filesystem should not prevent the blocks from being
included in the block report to the NameNode and reported as existing, it
should merely prevent new block writes to that partition until resolved.
> HDFS reporting missing blocks when they are actually present due to read-only
> filesystem
> ----------------------------------------------------------------------------------------
>
> Key: HDFS-8299
> URL: https://issues.apache.org/jira/browse/HDFS-8299
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.6.0
> Environment: HDP 2.2
> Reporter: Hari Sekhon
> Priority: Critical
> Attachments: datanode.log
>
>
> Fsck shows missing blocks when the blocks can be found on a datanode's
> filesystem and the datanode has been restarted to try to get it to recognize
> that the blocks are indeed present and hence report them to the NameNode in a
> block report.
> Fsck output showing an example "missing" block:
> {code}/apps/hive/warehouse/<custom_scrubbed>.db/someTable/000000_0: CORRUPT
> blockpool BP-120244285-<ip>-1417023863606 block blk_1075202330
> MISSING 1 blocks of total size 3260848 B
> 0. BP-120244285-<ip>-1417023863606:blk_1075202330_1484191 len=3260848
> MISSING!{code}
> The block is definitely present on more than one datanode however, here is
> the output from one of them that I restarted to try to get it to report the
> block to the NameNode:
> {code}# ll
> /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330*
> -rw-r--r-- 1 hdfs 499 3260848 Apr 27 15:02
> /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330
> -rw-r--r-- 1 hdfs 499 25483 Apr 27 15:02
> /archive1/dn/current/BP-120244285-<ip>-1417023863606/current/finalized/subdir22/subdir73/blk_1075202330_1484191.meta{code}
> It's worth noting that this is on HDFS tiered storage on an archive tier
> going to a networked block device that may have become temporarily
> unavailable but is available now. See also feature request HDFS-8297 for
> online rescan to not have to go around restarting datanodes.
> It turns out in the datanode log (that I am attaching) this is because the
> datanode fails to get a write lock on the filesystem. I think it would be
> better to be able to read-only those blocks however, since this way causes
> client visible data unavailability when the data could in fact be read.
> {code}2015-04-30 14:11:08,235 WARN datanode.DataNode
> (DataNode.java:checkStorageLocations(2284)) - Invalid dfs.datanode.data.dir
> /archive1/dn :
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not
> writable: /archive1/dn
> at
> org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:193)
> at
> org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:157)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2239)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2281)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2263)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2155)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378)
> at
> org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:78)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)