Wei-Chiu Chuang created HDFS-10360:
--------------------------------------
Summary: DataNode may format directory and lose blocks if If
current/VERSION is missing
Key: HDFS-10360
URL: https://issues.apache.org/jira/browse/HDFS-10360
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
Under certain circumstances, if the current/VERSION of a storage directory is
missing, DataNode may format the storage directory even though _block files are
not missing_.
This is very easy to reproduce. Simply launch a HDFS cluster and create some
files. Delete current/VERSION, and restart the data node.
After the restart, the data node will format the directory and remove all
existing block files:
{noformat}
2016-05-03 12:57:15,387 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock
on /data/dfs/dn/in_use.lock acquired by nodename
[email protected]
2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage:
Storage directory /data/dfs/dn is not formatted for
BP-787466439-172.26.24.43-1462305406642
2016-05-03 12:57:15,389 INFO org.apache.hadoop.hdfs.server.common.Storage:
Formatting ...
2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage:
Analyzing storage directories for bpid BP-787466439-172.26.24.43-1462305406642
2016-05-03 12:57:15,464 INFO org.apache.hadoop.hdfs.server.common.Storage:
Locking is disabled for
/data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642
2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage:
Block pool storage directory
/data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642 is not formatted
for BP-787466439-172
.26.24.43-1462305406642
2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage:
Formatting ...
2016-05-03 12:57:15,465 INFO org.apache.hadoop.hdfs.server.common.Storage:
Formatting block pool BP-787466439-172.26.24.43-1462305406642 directory
/data/dfs/dn/current/BP-787466439-172.26.24.43-1462305406642/current
{noformat}
The bug is: DataNode assumes that if none of {{current/VERSION}},
{{previous/}}, {{previous.tmp/}}, {{removed.tmp/}}, {{finalized.tmp/}} and
{{lastcheckpoint.tmp/}} exists, the storage directory contains nothing
important to HDFS and decides to format it. However, block files may still
exist, and in my opinion, we should do everything possible to retain the block
files.
I have two suggestions:
# check if {{current/}} directory is empty. If not, throw an
InconsistentFSStateException in {{Storage#analyzeStorage}} instead of asumming
its not formatted. Or,
# In {{Storage#clearDirectory}}, before it formats the storage directory,
rename or move {{current/}} directory. Also, log whatever is being
renamed/moved.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]