[ 
https://issues.apache.org/jira/browse/HDFS-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357050#comment-16357050
 ] 

Kihwal Lee commented on HDFS-13112:
-----------------------------------

When I ran the failed tests, the following is reproduced.  It seems related to 
the change. Please investigate.

{noformat}
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
[ERROR] Tests run: 11, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
111.306 s <<< FAILURE! - in 
org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
[ERROR] 
testSecretManagerState(org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions)
  Time elapsed: 60.008 s  <<< ERROR!
java.lang.Exception: test timed out after 60000 milliseconds
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1252)
        at java.lang.Thread.join(Thread.java:1326)
        at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.stopThreads(AbstractDelegationTokenSecretManager.java:653)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.stopSecretManager(FSNamesystem.java:1143)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.enterSafeMode(FSNamesystem.java:4535)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeAdapter.enterSafeMode(NameNodeAdapter.java:100)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions.testSecretManagerState(TestHAStateTransitions.java:525)
{noformat}

This passes without the patch.
{noformat}
mvn test -Dtest=TestHAStateTransitions#testSecretManagerState
{noformat}

> Token expiration edits may cause log corruption or deadlock
> -----------------------------------------------------------
>
>                 Key: HDFS-13112
>                 URL: https://issues.apache.org/jira/browse/HDFS-13112
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.1.0-beta, 0.23.8
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-13112.patch
>
>
> HDFS-4477 specifically did not acquire the fsn lock during token cancellation 
> based on the belief that edit logs are thread-safe.  However, log rolling is 
> not thread-safe.  Failure to externally synchronize on the fsn lock during a 
> roll will cause problems.
> For sync edit logging, it may cause corruption by interspersing edits with 
> the end/start segment edits.  Async edit logging may encounter a deadlock if 
> the log queue overflows.  Luckily, losing the race is extremely rare.  In ~5 
> years, we've never encountered it.  However, HDFS-13051 lost the race with 
> async edits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to