[ 
https://issues.apache.org/jira/browse/PHOENIX-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058796#comment-16058796
 ] 

Samarth Jain commented on PHOENIX-3970:
---------------------------------------

The way index rebuild works is by kicking off a select count(*) query on the 
data table with the index rebuild flag on. See this in 
UngroupedAggregateRegionObserver:

{code}
if (ScanUtil.isIndexRebuild(scan)) { 
   return rebuildIndices(s, region, scan, env.getConfiguration()); 
}
{code}

This kicks off a raw scan which scans the data table and replays the mutations 
by replaying those mutations on the *data table*. For every such batch of 
"replay mutations", the indexer co-processor does the rebuild work. 
See LocalTableState#getCurrentRowState()
{code}
// need to use a scan here so we can get raw state, which Get doesn't provide.
    Scan s = 
IndexManagementUtil.newLocalStateScan(Collections.singletonList(columns));
    s.setStartRow(row);
    s.setStopRow(row);
    if (ignoreNewerMutations) {
        // Provides a means of client indicating that newer cells should not be 
considered,
        // enabling mutations to be replayed to partially rebuild the index 
when a write fails.
        // When replaying mutations we want the oldest timestamp (as anything 
newer we be replayed)
        long ts = getOldestTimestamp(m.getFamilyCellMap().values());
        s.setTimeRange(0,ts);
    }
{code}

Now, the replay of mutations on data table needs to be on a handler pool that 
is different from the handler pool doing indexer work. It looks like this patch 
is making the count(*) query also use the index handler pool which can cause 
deadlocks.

> Ensure that automatic partial index rebuilds are served from the index 
> handler pool
> -----------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3970
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3970
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 3970.txt, 3970-v2.txt
>
>
> This (and other issues) have rendered multiple larger cluster inoperable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to