[ 
https://issues.apache.org/jira/browse/ACCUMULO-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873881#comment-13873881
 ] 

ASF subversion and git services commented on ACCUMULO-2198:
-----------------------------------------------------------

Commit cd4eac0d7e2820321db9fc9cdfc8dc89f7dd53d2 in branch refs/heads/master 
from [~bhavanki]
[ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=cd4eac0 ]

ACCUMULO-2198 Concurrent randomwalk: add teardown, fix server balance check

The Concurrent randomwalk test had been using a test node property to remember 
the
last time when servers were unbalanced, but this property was not getting 
cleaned up
between runs. Therefore, if a new Concurrent test was started some time later, 
it
would pick up the old timestamp property from the last run. This commit adds 
removal
of the property during test teardown, and also moves the tracking from a node
property to test state.

In addition, the test logic would reset the timestamp every time servers were 
found
unbalanced, provided the 15-minute allowance hadn't expired. This commit fixes 
that
issue as well. This could lead to more, correct, reports of unbalanced servers.

Lastly, the test in 1.5.x requires three checks for unbalanced servers to fail 
before
failing the test. This commit backports that requirement to 1.4.x.

The timestamp reset and three-check fixes were added to 1.5.x in commit 
0ee7e5a8.


> Concurrent randomwalk fails with unbalanced servers
> ---------------------------------------------------
>
>                 Key: ACCUMULO-2198
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2198
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.4.4
>            Reporter: Bill Havanki
>            Assignee: Bill Havanki
>              Labels: randomwalk, test
>
> Not always, but sometimes I am seeing the Concurrent randomwalk test fail 
> with:
> {noformat}
> java.lang.Exception: Error running node Concurrent.xml
>         at 
> org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:259)
> ...
> Caused by: java.lang.Exception: Error running node ct.CheckBalance
>         at 
> org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:259)
>         at 
> org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:251)
>         ... 8 more
> Caused by: java.lang.Exception: servers are unbalanced!
>         at 
> org.apache.accumulo.server.test.randomwalk.concurrent.CheckBalance.visit(CheckBalance.java:74)
>         at 
> org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:251)
>         ... 9 more
> {noformat}
> In one case, the 15-minute allowance for balancing extended to a prior run of 
> Concurrent.xml within the same overall test run. In another case, the time 
> span begins at a point when HDFS failed to contact a datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to