[ https://issues.apache.org/jira/browse/PHOENIX-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274959#comment-15274959 ]
James Taylor commented on PHOENIX-2883: --------------------------------------- FYI, the index rebuild process is attempted every 10 seconds. Is this for 4.7 or something else (as there were some changes as of 4.7). > Region close during automatic disabling of index for rebuilding can lead to > RS abort > ------------------------------------------------------------------------------------ > > Key: PHOENIX-2883 > URL: https://issues.apache.org/jira/browse/PHOENIX-2883 > Project: Phoenix > Issue Type: Bug > Reporter: Josh Elser > Assignee: Josh Elser > > (disclaimer: still performing due-diligence on this one) > I've been helping a user this week with what is thought to be a race > condition in secondary index updates. This user has a relatively heavy > write-based workload with a few tables that each have at least one index. > What we have seen is that when the region distribution is changing > (concretely, we were doing a rolling restart of the cluster without the load > balancer disabled in the hopes of retaining as much availability as > possible), I've seen the following general outline in the logs: > * An index update fails (due to {{ERROR 2008 (INT10)}} the index metadata > cache expired or is just missing) > * The index is taken offline to be asynchronously rebuilt > * A flush on the data table's region is queue for quite some time > * RS is asked to close a region (due to a move, commonly) > * RS aborts because the memstore for the data table's region is in an > inconsistent state (e.g. {{Assertion failed while closing store <region> > <colfam> flushableSize expected=0, actual= 193392. Current > memstoreSize=-552208. Maybe a coprocessor operation failed and left the > memstore in a partially updated state.}} > Some relevant HBase issues include HBASE-10514 and HBASE-10844. > Have been talking to [~ayingshu] and [~devaraj] about it, but haven't found > anything definitively conclusive yet. Will dump findings here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)