[
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151239#comment-13151239
]
nkeywal commented on HBASE-4798:
--------------------------------
@commiters: For coprocessor.TestRegionServerCoprocessorExceptionWithAbort, it's
a little bit strange. It fails sometimes, but not always. When it succeeds,
there are 3 retry, and it stops trying and the test is over. On failure, it
tries up to 10 times before being interrupt in the middle of a try by the test
timeout (because the waiting time between retries is greater then the timeout).
Question: are the retries expected?
If you increase the timeout enough, it's the 'put' which is marked as failed,
but the region does not abort for the failure scenario.
> Sleeps and synchronisation improvements for tests
> -------------------------------------------------
>
> Key: HBASE-4798
> URL: https://issues.apache.org/jira/browse/HBASE-4798
> Project: HBase
> Issue Type: Improvement
> Components: master, regionserver, test
> Affects Versions: 0.94.0
> Environment: all
> Reporter: nkeywal
> Assignee: nkeywal
> Priority: Minor
> Attachments: 4798_trunk_all.v2.patch
>
>
> Multiple small changes:
> @commiters: Removing some sleeps made visible a bug on
> JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point.
> You may want to review this.
> JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never
> met (test on "!c && !!c"). Added a new synchronization point.
> AssignementManager#waitForAssignment: add a timeout on the wait => not stuck
> if the notification is received before the wait.
> HMaster#loop: use a notification instead of a 1s sleep
> HRegionServer#waitForServerOnline: new method used by
> JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
> HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s
> sleep
> HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
> ZooKeeperNodeTracker#start: replace a recursive call by a loop
> ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait => not
> stuck if the notification is received before the wait.
> HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
> TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s,
> with the change on HBaseTestingUtility we are 60s faster
> TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead
> of 1s
> TestRestartCluster#testClusterRestart: send all the table creation together,
> then check creation, should be faster
> TestHLog: shutdown the whole cluster instead of DFS only (more standard)
> JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
> HConnectionManager#close: Zookeeper name in debug message from
> HConnectionManager after connection close was always null because it was set
> to null in the delete.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira