[jira] [Commented] (PHOENIX-2883) Region close during automatic disabling of index for rebuilding can lead to RS abort

James Taylor (JIRA) Fri, 06 May 2016 17:15:26 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274959#comment-15274959
 ]


James Taylor commented on PHOENIX-2883:
---------------------------------------

FYI, the index rebuild process is attempted every 10 seconds. Is this for 4.7 
or something else (as there were some changes as of 4.7).

> Region close during automatic disabling of index for rebuilding can lead to 
> RS abort
> ------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2883
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2883
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> (disclaimer: still performing due-diligence on this one)
> I've been helping a user this week with what is thought to be a race 
> condition in secondary index updates. This user has a relatively heavy 
> write-based workload with a few tables that each have at least one index.
> What we have seen is that when the region distribution is changing 
> (concretely, we were doing a rolling restart of the cluster without the load 
> balancer disabled in the hopes of retaining as much availability as 
> possible), I've seen the following general outline in the logs:
> * An index update fails (due to {{ERROR 2008 (INT10)}} the index metadata 
> cache expired or is just missing)
> * The index is taken offline to be asynchronously rebuilt
> * A flush on the data table's region is queue for quite some time
> * RS is asked to close a region (due to a move, commonly)
> * RS aborts because the memstore for the data table's region is in an 
> inconsistent state (e.g. {{Assertion failed while closing store <region> 
> <colfam> flushableSize expected=0, actual= 193392. Current 
> memstoreSize=-552208. Maybe a coprocessor operation failed and left the 
> memstore in a partially updated state.}}
> Some relevant HBase issues include HBASE-10514 and HBASE-10844.
> Have been talking to [~ayingshu] and [~devaraj] about it, but haven't found 
> anything definitively conclusive yet. Will dump findings here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2883) Region close during automatic disabling of index for rebuilding can lead to RS abort

Reply via email to