[
https://issues.apache.org/jira/browse/HBASE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-4890:
-------------------------
Attachment: 4890.txt
The NPE is happening in j-d's artificial case because we're doing a bulk open
of 3k regions and its taking a little while to complete; i.e. > than the rpc
timeout. There is no error though becaues this is a client running in the
master and its connecting to a single regionserver old doing meta scans in the
meantime etc. updating last activity on the connection... so we're not running
into a socket timeout which it looks like the expectation is here... that there
MUST be an exception outstanding if Call has been running for > rpctimeout.
Cosmin sees the exact stacktrace that Jon originally uploaded so we'll try this
patch on his cluster (Cosmin also speculates this NPE happens only in the
extreme, in ycsb or open 3k regions kinda extremes. He is seeing it only when
he does extreme load test on his cluster)
> fix possible NPE in HConnectionManager
> --------------------------------------
>
> Key: HBASE-4890
> URL: https://issues.apache.org/jira/browse/HBASE-4890
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.0
> Reporter: Jonathan Hsieh
> Priority: Blocker
> Fix For: 0.92.1
>
> Attachments: 4890.txt, splits.txt
>
>
> I was running YCSB against a 0.92 branch and encountered this error message:
> {code}
> 11/11/29 08:47:16 WARN client.HConnectionManager$HConnectionImplementation:
> Failed all from
> region=usertable,user3917479014967760871,1322555655231.f78d161e5724495a9723bcd972f97f41.,
> hostname=c0316.hal.cloudera.com, port=57020
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> java.lang.NullPointerException
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1501)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1353)
> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:898)
> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:775)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:750)
> at com.yahoo.ycsb.db.HBaseClient.update(Unknown Source)
> at com.yahoo.ycsb.DBWrapper.update(Unknown Source)
> at com.yahoo.ycsb.workloads.CoreWorkload.doTransactionUpdate(Unknown
> Source)
> at com.yahoo.ycsb.workloads.CoreWorkload.doTransaction(Unknown Source)
> at com.yahoo.ycsb.ClientThread.run(Unknown Source)
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1315)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1327)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1325)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:158)
> at $Proxy4.multi(Unknown Source)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1330)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1328)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1309)
> ... 7 more
> {code}
> It looks like the NPE is caused by server being null in the MultiRespone
> call() method.
> {code}
> public MultiResponse call() throws IOException {
> return getRegionServerWithoutRetries(
> new ServerCallable<MultiResponse>(connection, tableName, null) {
> public MultiResponse call() throws IOException {
> return server.multi(multi);
> }
> @Override
> public void connect(boolean reload) throws IOException {
> server =
> connection.getHRegionConnection(loc.getHostname(),
> loc.getPort());
> }
> }
> );
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira