Aaron T. Myers created HDFS-3864:
------------------------------------
Summary: NN does not update internal file mtime for OP_CLOSE when
reading from the edit log
Key: HDFS-3864
URL: https://issues.apache.org/jira/browse/HDFS-3864
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
When logging an OP_CLOSE to the edit log, the NN writes out an updated file
mtime and atime. However, when reading in an OP_CLOSE from the edit log, the NN
does not apply these values to the in-memory FS data structure. Because of
this, a file's mtime or atime may appear to go back in time after an NN
restart, or an HA failover.
Most of the time this will be harmless and folks won't notice, but in the event
one of these files is being used in the distributed cache of an MR job when an
HA failover occurs, the job might notice that the mtime of a cache file has
changed, which in MR2 will cause the job to fail with an exception like the
following:
{noformat}
java.io.IOException: Resource
hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
changed on src filesystem (expected 1342137814599, was 1342137814473
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat}
Credit to Sujay Rau for discovering this issue.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira