[
https://issues.apache.org/jira/browse/HBASE-22652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877479#comment-16877479
]
HBase QA commented on HBASE-22652:
----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}
0m 0s{color} | {color:orange} The patch doesn't appear to include any new or
modified tests. Please justify why no new tests are needed for this patch. Also
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m
23s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
15s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
14s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m
14s{color} | {color:green} branch has no errors when building our shaded
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m
27s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
13s{color} | {color:green} branch-2.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m
15s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m
13s{color} | {color:red} hbase-procedure: The patch generated 2 new + 0
unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m
14s{color} | {color:green} patch has no errors when building our shaded
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}
17m 16s{color} | {color:green} Patch does not cause any errors with Hadoop
2.7.7 2.8.5 or 3.0.3 3.1.2. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m
29s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
9s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m 16s{color} |
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.5 Server=18.09.5 base:
https://builds.apache.org/job/PreCommit-HBASE-Build/595/artifact/patchprocess/Dockerfile
|
| JIRA Issue | HBASE-22652 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12973503/HBASE-22652.branch-2.1.001.patch
|
| Optional Tests | dupname asflicense javac javadoc unit findbugs
shadedjars hadoopcheck hbaseanti checkstyle compile |
| uname | Linux 7b23991bd8d4 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4
22:49:08 UTC 2019 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | branch-2.1 / 35521d4a2f |
| maven | version: Apache Maven 3.5.4
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.11 |
| checkstyle |
https://builds.apache.org/job/PreCommit-HBASE-Build/595/artifact/patchprocess/diff-checkstyle-hbase-procedure.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-HBASE-Build/595/testReport/ |
| Max. process+thread count | 287 (vs. ulimit of 10000) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output |
https://builds.apache.org/job/PreCommit-HBASE-Build/595/console |
| Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
This message was automatically generated.
> Flakey TestLockManager; test timed out after 780 seconds
> --------------------------------------------------------
>
> Key: HBASE-22652
> URL: https://issues.apache.org/jira/browse/HBASE-22652
> Project: HBase
> Issue Type: Bug
> Components: proc-v2
> Affects Versions: 2.1.5
> Reporter: stack
> Assignee: stack
> Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6
>
> Attachments: HBASE-22652.branch-2.1.001.patch
>
>
> In nightly 1318 on branch-2.1, TestLockManager failed. It was only fail. It
> fails on occasion. In this instance, test failed like this:
> {code}
> Error Message
> test timed out after 780 seconds
> Stacktrace
> org.junit.runners.model.TestTimedOutException: test timed out after 780
> seconds
> at
> org.apache.hadoop.hbase.master.locking.TestLockManager.tearDown(TestLockManager.java:105)
> {code}
> Digging in log, before the timeout in teardown, you see this:
> {code}
> 2019-07-01 22:45:57,849 INFO [PEWorker-1]
> procedure2.ProcedureExecutor$WorkerThread(2041): ASSERT pid=11
> java.lang.AssertionError
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.add(TimeoutExecutorThread.java:81)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1765)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1462)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:78)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2039)
> {code}
> i.e. an assert trips. The assert is doing this:
> {code}
> assert procedure.getState() == ProcedureState.WAITING_TIMEOUT;
> {code}
> The procedure is expected to be in WAITING_TIMEOUT state when we go to add it
> to the timeout executor. In the log snippet above, we see the thread is in
> RUNNABLE state. But even on successful run it is in RUNNABLE state. So what
> is going on.
> I think I figured it accidentally.
> I'd added a pause at the assert and when the assert ran, procedure was in
> RUNNABLE state and it would fail every time. So, it is a timing issue.
> I added logging of the setting of state on the procedure. There weren't many
> procedures so was easy to follow. Here is what I saw (log included emission
> of stack trace):
> {code}
> 2019-07-02 19:50:14,941 DEBUG [PEWorker-1] locking.LockProcedure(313): LOCKED
> pid=11, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure,
> namespace=namespace, type=EXCLUSIVE
> 2019-07-02 19:50:14,943 INFO [PEWorker-1] procedure2.Procedure(793): SET
> STATE ON pid=11, state=WAITING_TIMEOUT, locked=true;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE
> java.lang.Exception
> at
> org.apache.hadoop.hbase.procedure2.Procedure.setState(Procedure.java:793)
> at
> org.apache.hadoop.hbase.master.locking.LockProcedure.execute(LockProcedure.java:249)
> at
> org.apache.hadoop.hbase.master.locking.LockProcedure.execute(LockProcedure.java:56)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:966)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1723)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1462)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:78)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2039)
> 2019-07-02 19:50:14,944 INFO [Time-limited test] procedure2.Procedure(793):
> SET STATE ON pid=11, state=RUNNABLE, locked=true;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE
> java.lang.Exception
> at
> org.apache.hadoop.hbase.procedure2.Procedure.setState(Procedure.java:793)
> at
> org.apache.hadoop.hbase.master.locking.LockProcedure.unlock(LockProcedure.java:225)
> at
> org.apache.hadoop.hbase.master.locking.LockManager$MasterLock.release(LockManager.java:185)
> at
> org.apache.hadoop.hbase.master.locking.TestLockManager.testMasterLockAcquire(TestLockManager.java:124)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
> at java.util.concurrent.FutureTask.run(FutureTask.java)
> at java.lang.Thread.run(Thread.java:748)
> 2019-07-02 19:50:15,037 DEBUG [Time-limited test]
> procedure.MasterProcedureScheduler(356): Add TableQueue(hbase:namespace,
> xlock=false sharedLock=1 size=1) to run queue because: pid=11,
> state=RUNNABLE, locked=true;
> org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace,
> type=EXCLUSIVE has lock
> 2019-07-02 19:53:37,062 WARN [JvmPauseMonitor]
> util.JvmPauseMonitor$Monitor(176): Detected pause in JVM or host machine (eg
> GC): pause of approximately 201656ms
> No GCs detected
> 2019-07-02 19:53:37,063 WARN [ProcExecTimeout]
> procedure2.ProcedureExecutor$WorkerMonitor(2147): Worker stuck
> PEWorker-1(pid=11), run time 3mins, 22.122sec
> 2019-07-02 19:53:37,063 WARN [RS:0;localhost:57336] util.Sleeper(86): We
> slept 202150ms instead of 1000ms, this is likely due to a long garbage
> collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2019-07-02 19:53:37,064 INFO [master/localhost:0.splitLogManager..Chore.1]
> hbase.ScheduledChore(176): Chore: SplitLogManager Timeout Monitor missed its
> start time
> Disconnected from the target VM, address: '127.0.0.1:57316', transport:
> 'socket'
> 2019-07-02 19:53:37,063 INFO [regionserver/localhost:0.Chore.1]
> hbase.ScheduledChore(176): Chore: MemstoreFlusherChore missed its start time
> 2019-07-02 19:53:37,069 INFO [PEWorker-1]
> procedure2.ProcedureExecutor$WorkerThread(2041): ASSERT pid=11
> java.lang.AssertionError
> at
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.add(TimeoutExecutorThread.java:81)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1765)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1462)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:78)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2039)
> {code}
> So, the worker thread goes to execute the lock procedure. It sets the state
> to WAITING_TIMEOUT... My break point trips and it holds up the timeout
> executor. Meantime the main test thread continues and releases lock which
> sets the procedure state back to RUNNABLE. I let the break point go and the
> worker thread finds the procedure state is RUNNABLE and the assertion trips.
> I think the assert should tolerate RUNNABLE state too.
> Looking at all ways into the add to timeout executor, all test in caller in
> one way or another for WAITING_TIMEOUT except for case where we are loading
> procedures from stores. This latter is a little unpredictable. The other
> entrances will have seen the state set to WAITING_TIMEOUT at least just
> before we get to the assert point.
> Let me relax the assert some.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)