[
https://issues.apache.org/jira/browse/PHOENIX-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Taylor updated PHOENIX-4190:
----------------------------------
Labels: secondary_index (was: )
> Salted local index failure is causing region server to abort
> ------------------------------------------------------------
>
> Key: PHOENIX-4190
> URL: https://issues.apache.org/jira/browse/PHOENIX-4190
> Project: Phoenix
> Issue Type: Bug
> Reporter: Samarth Jain
> Assignee: James Taylor
> Labels: secondary_index
> Fix For: 4.12.0
>
> Attachments: PHOENIX-4190.patch
>
>
> If you run just this case
> {code}
> { false, true, true, true, false, null}
> {code}
> in MutableIndexFailureIT on the 4.x-HBase-1.2 branch, [~rajeshbabu], you will
> see the following NPE in logs:
> {code}
> 2017-09-11 00:27:08,119 WARN
> [B.defaultRpcServer.handler=2,queue=0,port=63436]
> org.apache.phoenix.index.PhoenixIndexFailurePolicy(143): handleFailure failed
> java.lang.NullPointerException
> at
> org.apache.phoenix.util.SchemaUtil.getTableKeyFromFullName(SchemaUtil.java:707)
> at
> org.apache.phoenix.util.IndexUtil.updateIndexState(IndexUtil.java:717)
> at
> org.apache.phoenix.index.PhoenixIndexFailurePolicy.handleFailureWithExceptions(PhoenixIndexFailurePolicy.java:221)
> at
> org.apache.phoenix.index.PhoenixIndexFailurePolicy.handleFailure(PhoenixIndexFailurePolicy.java:140)
> at
> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:155)
> at
> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:139)
> at
> org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:651)
> at org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:608)
> at
> org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:591)
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1034)
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673)
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1749)
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1705)
> at
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1030)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3322)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2881)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2823)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:758)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:720)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2168)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2188)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> This happens only for salted local indexes. If I remove the SALT_BUCKETS from
> the table DDL, then the test passes fine. On looking closely at the code, it
> seems like something is wrong with the computation of offset and subsequent
> parsing of the index id from the row key here (in PhoenixIndexFailurePolicy):
> {code}
> int offset =
> regionInfo.getStartKey().length == 0 ?
> regionInfo.getEndKey().length
> : regionInfo.getStartKey().length;
> byte[] viewId = null;
> for (Mutation mutation : mutations) {
> viewId =
> indexMaintainer.getViewIndexIdFromIndexRowKey(
> new ImmutableBytesWritable(mutation.getRow(),
> offset,
> mutation.getRow().length - offset));
> String indexTableName = localIndexNames.get(new
> ImmutableBytesWritable(viewId));
> indexTableNames.add(indexTableName);
> }
> {code}
> Because of this NPE in PhoenixIndexFailurePolicy, we end up triggering the
> KillServerOnFailurePolicy which ends up causing the region server to abort.
> This region server abort is also the reason why our builds against the
> 4.x-HBase-1.2 branch are hanging. I also believe once we fix this, we can
> hopefully reenable back the parameters which were testing out rebuild of
> local indexes for the 4.x-HBase-0.98, 4.x-HBase-1.1 and 4.x-HBase-1.2
> branches. On the master branch, because local index update is transactional
> with data table update, we won' run into such failure scenarios (I think).
> [~jamestaylor] - A bit orthogonal, but it seems like we can do better here.
> Wouldn't a better option here would be to let HBase black list the Indexer
> co-processor in cases of such bugs? Else, we run the risk of shutting down
> the entire HBase cluster which is what happened here.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)