[
https://issues.apache.org/jira/browse/HBASE-20895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548015#comment-16548015
]
Andrew Purtell commented on HBASE-20895:
----------------------------------------
Here are two WIP patches that take a similar approach to what we did on
HBASE-14050 but rather than leaving the bytebuffer reference in place (comments
indicate we are trying to get it GCed by nulling the reference) use an atomic
reference to avoid use of a reference by one thread after a null has been
stored back to it by another.
> NPE in RpcServer#readAndProcess
> -------------------------------
>
> Key: HBASE-20895
> URL: https://issues.apache.org/jira/browse/HBASE-20895
> Project: HBase
> Issue Type: Bug
> Components: rpc
> Affects Versions: 1.3.2
> Reporter: Andrew Purtell
> Assignee: Monani Mihir
> Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.6
>
> Attachments: HBASE-20895-branch-1.3.patch, HBASE-20895-branch-1.patch
>
>
> {noformat}
> 2018-07-10 16:25:55,005 DEBUG [.sfdc.net,port=60020] ipc.RpcServer -
> RpcServer.listener,port=60020: Caught exception while reading:
> java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1761)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:949)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:730)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:706)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This looks like it could be a use after close problem if there is concurrent
> access to a Connection.
> In process() we might store a null back to the 'data' field.
> Meanwhile in readAndProcess() we have a case where we might be blocked on a
> channel read and then after coming back from the read we go to use 'data'
> after a null has been written back, leading to a NPE.
> {quote}count = channelRead(channel, data);
> 1761 ---> if (count >= 0 && *data.remaining()* == 0)
> \{ process(); }{quote}
> Whether a NPE happens or not is going to depend on the timing of the store
> back to 'data' in another thread and use of 'data' in this thread and whether
> or not the JVM has optimized away a reload of 'data' (it's not declared
> volatile)
> We should do a null check here just to be defensive. We should also look at
> whether concurrent access to the Connection is happening and intended.The
> above is just a theory. We should also look at other execution sequences that
> could lead to 'data' being null in this location. At a glance I didn't find
> one but the store to 'data' happens behind conditionals so it is possible.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)