Jean-Daniel Cryans created HBASE-7513:
-----------------------------------------
Summary: HDFSBlocksDistribution shouldn't send NPEs when something
goes wrong
Key: HBASE-7513
URL: https://issues.apache.org/jira/browse/HBASE-7513
Project: HBase
Issue Type: Bug
Reporter: Jean-Daniel Cryans
Priority: Minor
Fix For: 0.96.0
I saw a pretty weird failure on a cluster with corrupted files and this
particular exception really threw me off:
{noformat}
2013-01-07 09:58:59,054 ERROR
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of
region=redacted., starting to roll back the global memstore size.
java.io.IOException: java.io.IOException: java.lang.NullPointerException: empty
hosts
at
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:548)
at
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:461)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3814)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3762)
at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:332)
at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.lang.NullPointerException: empty hosts
at
org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:403)
at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:256)
at
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2995)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:523)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:521)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
... 3 more
Caused by: java.lang.NullPointerException: empty hosts
at
org.apache.hadoop.hbase.HDFSBlocksDistribution.addHostsAndBlockWeight(HDFSBlocksDistribution.java:123)
at
org.apache.hadoop.hbase.util.FSUtils.computeHDFSBlocksDistribution(FSUtils.java:597)
at
org.apache.hadoop.hbase.regionserver.StoreFile.computeHDFSBlockDistribution(StoreFile.java:492)
at
org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:521)
at
org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:602)
at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:380)
at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:375)
... 8 more
2013-01-07 09:58:59,059 INFO
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opening of
region "redacted" failed, marking as FAILED_OPEN in ZK
{noformat}
This is what the code looks like:
{code}
if (hosts == null || hosts.length == 0) {
throw new NullPointerException("empty hosts");
}
{code}
So {{hosts}} can exist but we send an NPE anyways? And then this is wrapped in
{{Store}} by:
{code}
} catch (ExecutionException e) {
throw new IOException(e.getCause());
{code}
FWIW there's another NPE thrown in
{{HDFSBlocksDistribution.addHostAndBlockWeight}} and it looks wrong.
We should change the code to just skip computing the locality if it's missing
and not throw big ugly exceptions. In this case the region would fail opening
later anyways but at least the error message will be clearer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira