[
https://issues.apache.org/jira/browse/HDFS-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757933#comment-13757933
]
Kihwal Lee commented on HDFS-5162:
----------------------------------
Here is the trace:
{panel}
2013-09-03 23:30:05,342 [IPC Server handler 80 on 8020] INFO
namenode.EditLogInputStream: Fast-forwarding stream
'/some_shared_edits_dir/current/edits_0000000000007107855-0000000000007114171'
to transaction ID 7107855
2013-09-03 23:30:05,359 [IPC Server handler 80 on 8020] ERROR
namenode.FSEditLogLoader: Encountered exception on operation AddOp [length=0,
inodeId=1200681, path=/some_path/infile605, replication=3, mtime=1378250937359,
atime=1378250937359, blockSize=134217728, blocks=[],
permissions=dfsload:hdfs:rw-------,
clientName=DFSClient_NONMAPREDUCE_-673728062_1, clientMachine=1.2.3.4,
RpcClientId=48381b03-7421-4fbf-bdbd-64d73ec09f0a, RpcCallId=2425,
opCode=OP_ADD, txid=7107861]
java.lang.IllegalStateException
at
com.google.common.base.Preconditions.checkState(Preconditions.java:129)
at
org.apache.hadoop.util.LightWeightCache.evictExpiredEntries(LightWeightCache.java:178)
at
org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:213)
at
org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:267)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:703)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:323)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:198)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:111)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:183)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:179)
{panel}
> Name node crashes due to improper synchronization in RetryCache
> ---------------------------------------------------------------
>
> Key: HDFS-5162
> URL: https://issues.apache.org/jira/browse/HDFS-5162
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Kihwal Lee
> Priority: Blocker
>
> In LightWeightCache#evictExpiredEntries(), the precondition check can fail.
> [~patwhitey2007] ran a HA failover test and it occurred while the SBN was
> catching up with edits during a transition to active. This caused NN to
> terminate.
> Here is my theory: If an RPC handler calls waitForCompletion() and it happens
> to remove the head of the queue in get(), it will race with
> evictExpiredEntries() frrom put().
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira