[jira] [Commented] (PHOENIX-2883) Region close during automatic disabling of index for rebuilding can lead to RS abort

Guizhou Feng (JIRA) Tue, 06 Feb 2018 22:40:50 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355030#comment-16355030
 ]


Guizhou Feng commented on PHOENIX-2883:
---------------------------------------

I encounter similar case while build index async via IndexTool

HBase Version: 1.2.0-cdh5.10.1
Phoenix Version: phoenix-4.8.0-cdh5.8.0-server.jar

Behavior Description:

    1. Create Index: CREATE INDEX "prod:log_my_phx_3_idx"
    ON "prod:log_my_phx" ("id", "version", "event_time" )
        INCLUDE(
         "name",
         "code",
         "type",
         "decision",
         "monitoring") ASYNC;
    2. Run IndexTool mapreduce job

MapReduce job run succeed, index is activated, although alter index statement 
throw NullPointerException as below

ALTER INDEX IF EXISTS "prod:log_my_phx_3_idx" ON "prod:log_my_phx" ACTIVE

18/02/06 16:26:32 INFO index.IndexToolUtil: alterQuery: ALTER INDEX IF EXISTS 
"prod:log_my_phx_3_idx" ON "prod:log_my_phx" ACTIVE
18/02/06 16:26:32 ERROR index.IndexTool: An exception occurred while performing 
the indexing job: NullPointerException:  at:
java.lang.NullPointerException
    at org.apache.phoenix.schema.PMetaDataImpl.addTable(PMetaDataImpl.java:108)
    at 
org.apache.phoenix.jdbc.PhoenixConnection.addTable(PhoenixConnection.java:903)
    at 
org.apache.phoenix.schema.MetaDataClient.addTableToCache(MetaDataClient.java:3539)
    at 
org.apache.phoenix.schema.MetaDataClient.alterIndex(MetaDataClient.java:3504)
    at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableAlterIndexStatement$1.execute(PhoenixStatement.java:993)
    at 
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:344)
    at 
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:332)
    at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
    at 
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:331)
    at 
org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1442)
    at 
org.apache.phoenix.mapreduce.index.IndexToolUtil.updateIndexState(IndexToolUtil.java:75)
    at org.apache.phoenix.mapreduce.index.IndexTool.run(IndexTool.java:245)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.phoenix.mapreduce.index.IndexTool.main(IndexTool.java:384)


FATAL Errors in RegionServer:
ABORTING region server my-stage-hadoop-prod08-bp,60020,1504851662123: Assertion 
failed while closing store 
prod:log_my_phx,12_1022360799_801_V19,1517885909466.4a9e01d05c167dc6bdcab962763d7096.
 0. flushableSize expected=0, actual= 207088. Current memstoreSize=-114100080. 
Maybe a coprocessor operation failed and left the memstore in a partially 
updated state.

RegionServer abort: loaded coprocessors are: 
[org.apache.phoenix.coprocessor.MetaDataEndpointImpl, 
org.apache.phoenix.coprocessor.SequenceRegionObserver, 
org.apache.phoenix.coprocessor.ScanRegionObserver, 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, 
org.apache.phoenix.hbase.index.Indexer, 
org.apache.phoenix.coprocessor.MetaDataRegionObserver, 
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, 
org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, 
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, 
org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint, 
org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 


By the way, only one of the region server abort, the abort of region server 
bring a lot of inconsistency due to region in transition and hard to recover 
with hbase hbck -repair, it took whole day to run the repair bunch of times










 

> Region close during automatic disabling of index for rebuilding can lead to 
> RS abort
> ------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2883
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2883
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Major
>
> (disclaimer: still performing due-diligence on this one)
> I've been helping a user this week with what is thought to be a race 
> condition in secondary index updates. This user has a relatively heavy 
> write-based workload with a few tables that each have at least one index.
> What we have seen is that when the region distribution is changing 
> (concretely, we were doing a rolling restart of the cluster without the load 
> balancer disabled in the hopes of retaining as much availability as 
> possible), I've seen the following general outline in the logs:
> * An index update fails (due to {{ERROR 2008 (INT10)}} the index metadata 
> cache expired or is just missing)
> * The index is taken offline to be asynchronously rebuilt
> * A flush on the data table's region is queue for quite some time
> * RS is asked to close a region (due to a move, commonly)
> * RS aborts because the memstore for the data table's region is in an 
> inconsistent state (e.g. {{Assertion failed while closing store <region> 
> <colfam> flushableSize expected=0, actual= 193392. Current 
> memstoreSize=-552208. Maybe a coprocessor operation failed and left the 
> memstore in a partially updated state.}}
> Some relevant HBase issues include HBASE-10514 and HBASE-10844.
> Have been talking to [~ayingshu] and [~devaraj] about it, but haven't found 
> anything definitively conclusive yet. Will dump findings here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-2883) Region close during automatic disabling of index for rebuilding can lead to RS abort

Reply via email to