[ 
https://issues.apache.org/jira/browse/HBASE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13853279#comment-13853279
 ] 

Ted Yu commented on HBASE-10142:
--------------------------------

There used to be some comment around low replication checking in FSHlog:
{code}
      // TODO: preserving the old behavior for now, but this check is strange. 
It's not
      //       protected by any locks here, so for all we know rolling locks 
might start
      //       as soon as we enter the "if". Is this best-effort optimization 
check?
      if (!this.logRollRunning) {
        checkLowReplication();
{code}
This means that checkLowReplication() may be running when FSHLog#rollWriter() 
is also running - hence the race.
That is why checkLowReplication() is now put under reentrant lock so that the 
race wouldn't happen.


> TestLogRolling#testLogRollOnDatanodeDeath test failure
> ------------------------------------------------------
>
>                 Key: HBASE-10142
>                 URL: https://issues.apache.org/jira/browse/HBASE-10142
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.0, 0.99.0
>            Reporter: Andrew Purtell
>            Assignee: Ted Yu
>             Fix For: 0.98.0, 0.99.0
>
>         Attachments: 10142-v1.txt
>
>
> This is a demanding unit test, which fails fairly often as software versions 
> (JVM, Hadoop) and system load change. Currently when testing 0.98 branch I 
> see this failure:
> {noformat}
> Failed tests:   
> testLogRollOnDatanodeDeath(org.apache.hadoop.hbase.regionserver.wal.TestLogRolling):
>  LowReplication Roller should've been disabled, current replication=1
> {noformat} 
> Could be a timing issue after the recent switch to Hadoop 2 as default 
> build/test profile. Let's see if more leniency makes sense and if it can 
> stabilize the test before disabling it.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to