[
https://issues.apache.org/jira/browse/HBASE-20038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang updated HBASE-20038:
------------------------------
Attachment: HBASE-20038.patch
> TestLockProcedure.testTimeout is flakey
> ---------------------------------------
>
> Key: HBASE-20038
> URL: https://issues.apache.org/jira/browse/HBASE-20038
> Project: HBase
> Issue Type: Sub-task
> Reporter: Duo Zhang
> Priority: Major
> Attachments: HBASE-20038.patch
>
>
> The test is simple so it is easy to find out the problem.
> {noformat}
> 2018-02-21 04:53:32,230 INFO [Time-limited test] hbase.ResourceChecker(148):
> before: master.locking.TestLockProcedure#testTimeout Thread=218,
> OpenFileDescriptor=853, MaxFileDescriptor=60000, SystemLoadAverage=5075,
> ProcessCount=312, AvailableMemoryMB=5373
> 2018-02-21 04:53:32,234 WARN [Time-limited test]
> procedure2.ProcedureTestingUtility(146): Set Kill before store update to:
> false
> 2018-02-21 04:53:32,278 DEBUG [Time-limited test]
> procedure2.ProcedureExecutor(866): Stored pid=14, state=RUNNABLE;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE
> 2018-02-21 04:53:32,285 DEBUG [Time-limited test] locking.LockProcedure(195):
> Heartbeat pid=14, state=RUNNABLE;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE
> 2018-02-21 04:53:32,286 DEBUG [PEWorker-1] locking.LockProcedure(312): LOCKED
> pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure,
> namespace=namespace, type=EXCLUSIVE
> 2018-02-21 04:53:32,303 DEBUG [Time-limited test]
> locking.TestLockProcedure(204): Proc id 14 acquired lock.
> 2018-02-21 04:53:32,298 INFO [PEWorker-1]
> procedure2.TimeoutExecutorThread(82): ADDED pid=14, state=WAITING_TIMEOUT;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE; timeout=2000, timestamp=1519188814298
> 2018-02-21 04:53:33,303 DEBUG [Time-limited test] locking.LockProcedure(195):
> Heartbeat pid=14, state=WAITING_TIMEOUT;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE
> 2018-02-21 04:53:33,304 DEBUG [Time-limited test]
> locking.TestLockProcedure(225): Proc id 14 : LOCKED.
> 2018-02-21 04:53:34,299 DEBUG [ProcExecTimeout] locking.LockProcedure(207):
> Timeout failure ProcedureEvent for pid=14, state=WAITING_TIMEOUT;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE, ready=false, [pid=14, state=WAITING_TIMEOUT;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE]
> 2018-02-21 04:53:34,299 DEBUG [ProcExecTimeout] locking.LockProcedure(210):
> Calling wake on ProcedureEvent for pid=14, state=RUNNABLE;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE, ready=false, [pid=14, state=RUNNABLE;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE]
> 2018-02-21 04:53:34,299 INFO [PEWorker-1]
> procedure2.TimeoutExecutorThread(82): ADDED pid=14, state=WAITING_TIMEOUT;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE; timeout=2000, timestamp=1519188816299
> 2018-02-21 04:53:34,306 DEBUG [Time-limited test] locking.LockProcedure(195):
> Heartbeat pid=14, state=WAITING_TIMEOUT;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE
> 2018-02-21 04:53:34,306 DEBUG [Time-limited test]
> locking.TestLockProcedure(225): Proc id 14 : LOCKED.
> 2018-02-21 04:53:34,766 WARN [HBase-Metrics2-1] impl.MetricsConfig(125):
> Cannot locate configuration: tried
> hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> 2018-02-21 04:53:36,299 DEBUG [ProcExecTimeout] locking.LockProcedure(207):
> Timeout failure ProcedureEvent for pid=14, state=WAITING_TIMEOUT;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE, ready=false, [pid=14, state=WAITING_TIMEOUT;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE]
> 2018-02-21 04:53:36,299 DEBUG [ProcExecTimeout] locking.LockProcedure(210):
> Calling wake on ProcedureEvent for pid=14, state=RUNNABLE;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE, ready=false, [pid=14, state=RUNNABLE;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE]
> 2018-02-21 04:53:36,299 INFO [PEWorker-1]
> procedure2.TimeoutExecutorThread(82): ADDED pid=14, state=WAITING_TIMEOUT;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE; timeout=2000, timestamp=1519188818299
> 2018-02-21 04:53:38,299 DEBUG [ProcExecTimeout] locking.LockProcedure(207):
> Timeout failure ProcedureEvent for pid=14, state=WAITING_TIMEOUT;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE, ready=false, [pid=14, state=WAITING_TIMEOUT;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE]
> 2018-02-21 04:53:38,299 DEBUG [ProcExecTimeout] locking.LockProcedure(210):
> Calling wake on ProcedureEvent for pid=14, state=RUNNABLE;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE, ready=false, [pid=14, state=RUNNABLE;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE]
> 2018-02-21 04:53:38,310 DEBUG [Time-limited test] locking.LockProcedure(195):
> Heartbeat pid=14, state=RUNNABLE;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE
> 2018-02-21 04:53:38,310 INFO [PEWorker-1]
> procedure2.TimeoutExecutorThread(82): ADDED pid=14, state=WAITING_TIMEOUT;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE; timeout=2000, timestamp=1519188820310
> 2018-02-21 04:53:38,311 WARN [Time-limited test]
> procedure2.ProcedureTestingUtility(146): Set Kill before store update to:
> false
> 2018-02-21 04:53:38,342 DEBUG [PEWorker-1] locking.LockProcedure(242):
> UNLOCKED pid=14, state=RUNNABLE;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE
> 2018-02-21 04:53:38,366 INFO [PEWorker-1]
> procedure2.ProcedureExecutor(1247): Finished pid=14, state=SUCCESS;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE in 6.0920sec
> 2018-02-21 04:53:38,628 INFO [Time-limited test] hbase.ResourceChecker(172):
> after: master.locking.TestLockProcedure#testTimeout Thread=209 (was 218),
> OpenFileDescriptor=851 (was 853), MaxFileDescriptor=60000 (was 60000),
> SystemLoadAverage=5045 (was 5075), ProcessCount=314 (was 312) - ProcessCount
> LEAK? -, AvailableMemoryMB=5071 (was 5373)
> {noformat}
> The problem is that, sleep 2 * HEARTBEAT_TIMEOUT is not safe. It is possible
> that after the first TIMEOUT the procedure has not been timeout yet and then
> the check for unlocked and the actual timeout processing will arrive at
> almost the same time. You can see the above log, we are going to mark the
> procedure as timeout at 04:53:38,299 and the heartbeat for checking unlock
> arrives at 04:53:38,310.
> I think increase the wait time can solve the problem.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)