[ 
https://issues.apache.org/jira/browse/HDFS-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757933#comment-13757933
 ] 

Kihwal Lee commented on HDFS-5162:
----------------------------------

Here is the trace:

{panel}
2013-09-03 23:30:05,342 [IPC Server handler 80 on 8020] INFO 
namenode.EditLogInputStream: Fast-forwarding stream 
'/some_shared_edits_dir/current/edits_0000000000007107855-0000000000007114171' 
to transaction ID 7107855
2013-09-03 23:30:05,359 [IPC Server handler 80 on 8020] ERROR 
namenode.FSEditLogLoader: Encountered exception on operation AddOp [length=0, 
inodeId=1200681, path=/some_path/infile605, replication=3, mtime=1378250937359, 
atime=1378250937359, blockSize=134217728, blocks=[], 
permissions=dfsload:hdfs:rw-------, 
clientName=DFSClient_NONMAPREDUCE_-673728062_1, clientMachine=1.2.3.4, 
RpcClientId=48381b03-7421-4fbf-bdbd-64d73ec09f0a, RpcCallId=2425, 
opCode=OP_ADD, txid=7107861]
java.lang.IllegalStateException
        at 
com.google.common.base.Preconditions.checkState(Preconditions.java:129)
        at 
org.apache.hadoop.util.LightWeightCache.evictExpiredEntries(LightWeightCache.java:178)
        at 
org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:213)
        at 
org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:267)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:703)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:323)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:198)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:111)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:183)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:179)
{panel}
                
> Name node crashes due to improper synchronization in RetryCache
> ---------------------------------------------------------------
>
>                 Key: HDFS-5162
>                 URL: https://issues.apache.org/jira/browse/HDFS-5162
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Priority: Blocker
>
> In LightWeightCache#evictExpiredEntries(), the precondition check can fail. 
> [~patwhitey2007] ran a HA failover test and it occurred while the SBN was 
> catching up with edits during a transition to active. This caused NN to 
> terminate.
> Here is my theory: If an RPC handler calls waitForCompletion() and it happens 
> to remove the head of the queue in get(), it will race with 
> evictExpiredEntries() frrom put().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to