[
https://issues.apache.org/jira/browse/HBASE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13853279#comment-13853279
]
Ted Yu commented on HBASE-10142:
--------------------------------
There used to be some comment around low replication checking in FSHlog:
{code}
// TODO: preserving the old behavior for now, but this check is strange.
It's not
// protected by any locks here, so for all we know rolling locks
might start
// as soon as we enter the "if". Is this best-effort optimization
check?
if (!this.logRollRunning) {
checkLowReplication();
{code}
This means that checkLowReplication() may be running when FSHLog#rollWriter()
is also running - hence the race.
That is why checkLowReplication() is now put under reentrant lock so that the
race wouldn't happen.
> TestLogRolling#testLogRollOnDatanodeDeath test failure
> ------------------------------------------------------
>
> Key: HBASE-10142
> URL: https://issues.apache.org/jira/browse/HBASE-10142
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.98.0, 0.99.0
> Reporter: Andrew Purtell
> Assignee: Ted Yu
> Fix For: 0.98.0, 0.99.0
>
> Attachments: 10142-v1.txt
>
>
> This is a demanding unit test, which fails fairly often as software versions
> (JVM, Hadoop) and system load change. Currently when testing 0.98 branch I
> see this failure:
> {noformat}
> Failed tests:
> testLogRollOnDatanodeDeath(org.apache.hadoop.hbase.regionserver.wal.TestLogRolling):
> LowReplication Roller should've been disabled, current replication=1
> {noformat}
> Could be a timing issue after the recent switch to Hadoop 2 as default
> build/test profile. Let's see if more leniency makes sense and if it can
> stabilize the test before disabling it.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)