[
https://issues.apache.org/jira/browse/PHOENIX-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355030#comment-16355030
]
Guizhou Feng commented on PHOENIX-2883:
---------------------------------------
I encounter similar case while build index async via IndexTool
HBase Version: 1.2.0-cdh5.10.1
Phoenix Version: phoenix-4.8.0-cdh5.8.0-server.jar
Behavior Description:
1. Create Index: CREATE INDEX "prod:log_my_phx_3_idx"
ON "prod:log_my_phx" ("id", "version", "event_time" )
INCLUDE(
"name",
"code",
"type",
"decision",
"monitoring") ASYNC;
2. Run IndexTool mapreduce job
MapReduce job run succeed, index is activated, although alter index statement
throw NullPointerException as below
ALTER INDEX IF EXISTS "prod:log_my_phx_3_idx" ON "prod:log_my_phx" ACTIVE
18/02/06 16:26:32 INFO index.IndexToolUtil: alterQuery: ALTER INDEX IF EXISTS
"prod:log_my_phx_3_idx" ON "prod:log_my_phx" ACTIVE
18/02/06 16:26:32 ERROR index.IndexTool: An exception occurred while performing
the indexing job: NullPointerException: at:
java.lang.NullPointerException
at org.apache.phoenix.schema.PMetaDataImpl.addTable(PMetaDataImpl.java:108)
at
org.apache.phoenix.jdbc.PhoenixConnection.addTable(PhoenixConnection.java:903)
at
org.apache.phoenix.schema.MetaDataClient.addTableToCache(MetaDataClient.java:3539)
at
org.apache.phoenix.schema.MetaDataClient.alterIndex(MetaDataClient.java:3504)
at
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableAlterIndexStatement$1.execute(PhoenixStatement.java:993)
at
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:344)
at
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:332)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:331)
at
org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1442)
at
org.apache.phoenix.mapreduce.index.IndexToolUtil.updateIndexState(IndexToolUtil.java:75)
at org.apache.phoenix.mapreduce.index.IndexTool.run(IndexTool.java:245)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.phoenix.mapreduce.index.IndexTool.main(IndexTool.java:384)
FATAL Errors in RegionServer:
ABORTING region server my-stage-hadoop-prod08-bp,60020,1504851662123: Assertion
failed while closing store
prod:log_my_phx,12_1022360799_801_V19,1517885909466.4a9e01d05c167dc6bdcab962763d7096.
0. flushableSize expected=0, actual= 207088. Current memstoreSize=-114100080.
Maybe a coprocessor operation failed and left the memstore in a partially
updated state.
RegionServer abort: loaded coprocessors are:
[org.apache.phoenix.coprocessor.MetaDataEndpointImpl,
org.apache.phoenix.coprocessor.SequenceRegionObserver,
org.apache.phoenix.coprocessor.ScanRegionObserver,
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver,
org.apache.phoenix.hbase.index.Indexer,
org.apache.phoenix.coprocessor.MetaDataRegionObserver,
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver,
org.apache.hadoop.hbase.regionserver.LocalIndexSplitter,
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl,
org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint,
org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
By the way, only one of the region server abort, the abort of region server
bring a lot of inconsistency due to region in transition and hard to recover
with hbase hbck -repair, it took whole day to run the repair bunch of times
> Region close during automatic disabling of index for rebuilding can lead to
> RS abort
> ------------------------------------------------------------------------------------
>
> Key: PHOENIX-2883
> URL: https://issues.apache.org/jira/browse/PHOENIX-2883
> Project: Phoenix
> Issue Type: Bug
> Reporter: Josh Elser
> Assignee: Josh Elser
> Priority: Major
>
> (disclaimer: still performing due-diligence on this one)
> I've been helping a user this week with what is thought to be a race
> condition in secondary index updates. This user has a relatively heavy
> write-based workload with a few tables that each have at least one index.
> What we have seen is that when the region distribution is changing
> (concretely, we were doing a rolling restart of the cluster without the load
> balancer disabled in the hopes of retaining as much availability as
> possible), I've seen the following general outline in the logs:
> * An index update fails (due to {{ERROR 2008 (INT10)}} the index metadata
> cache expired or is just missing)
> * The index is taken offline to be asynchronously rebuilt
> * A flush on the data table's region is queue for quite some time
> * RS is asked to close a region (due to a move, commonly)
> * RS aborts because the memstore for the data table's region is in an
> inconsistent state (e.g. {{Assertion failed while closing store <region>
> <colfam> flushableSize expected=0, actual= 193392. Current
> memstoreSize=-552208. Maybe a coprocessor operation failed and left the
> memstore in a partially updated state.}}
> Some relevant HBase issues include HBASE-10514 and HBASE-10844.
> Have been talking to [~ayingshu] and [~devaraj] about it, but haven't found
> anything definitively conclusive yet. Will dump findings here.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)