[
https://issues.apache.org/jira/browse/HBASE-19870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342526#comment-16342526
]
Chia-Ping Tsai commented on HBASE-19870:
----------------------------------------
{quote}And maybe the testNotCloseZkWhenPending is enough for testing the
problem? Just add a assert to make sure that the thread is still alive, and try
reading from the ROZKClient to make sure that it still works?
{quote}
I don't think so. Reading the data from ROZKClient will add two tasks to the
queue - 1) call the async api (++pendingRequests) of zk and 2) handler the
callback of zk (--pendingRequests). This NPE happens because the number of
pendingRequests is not equal with zero and the no task exist in the queue.
Specifically, the NPE is caused by the following events.
# add the first task (number of task = 1, pendingRequests = 0)
# ROZKClient#run execute the first task ( number of task = 0, pendingRequests
=> 1, register the callback to zk)
# zk is too busy to run the callback ( pendingRequests => 1)
# ROZKClient#run get null task and number of pendingRequests isn't equal with
zero. ROZKClient SHOULD wait for next task but it try to process the null
task...
If we want to reproduce the error, we must make sure the ROZKClient#run execute
before the second task is added. The testNotCloseZkWhenPending add a blocker to
the first task hence it also block the ROZKClient#run.
{code:java}
doAnswer(new Answer<Object>() {
@Override
public Object answer(InvocationOnMock invocation) throws Throwable {
latch.await();
return invocation.callRealMethod();
}
}).when(mockedZK).exists(anyString(), anyBoolean(), any(StatCallback.class),
any());
RO_ZK.zookeeper = mockedZK;
CompletableFuture<Stat> future = RO_ZK.exists(PATH);
// 2 * keep alive time to ensure that we will not close the zk when there are
pending requests
Thread.sleep(6000);{code}
I guess testNotCloseZkWhenPending tried do make the same concurrent contention
as this issue but it didn't. [~Apache9] WDYT?
> Fix the NPE in ReadOnlyZKClient#run
> -----------------------------------
>
> Key: HBASE-19870
> URL: https://issues.apache.org/jira/browse/HBASE-19870
> Project: HBase
> Issue Type: Sub-task
> Reporter: Chia-Ping Tsai
> Assignee: Chia-Ping Tsai
> Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19870.v1.patch
>
>
> I notice a NPE from my jenkins.
> {code}
> 2018-01-26 17:26:41,078 DEBUG [M:0;8546d406e429:40557-EventThread]
> zookeeper.ZKWatcher(443): replicationLogCleaner-0x161337ddc090004,
> quorum=localhost:56060, baseZNode=/hbase Received ZooKeeper Event, type=None,
> state=Disconnected, path=null
> java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:322)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> If any zk task invokes the #onComplete late, the count of current requests
> will not zero and then the null from task queue will destroy the work thread
> in ReadOnlyZKClient.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)