[ https://issues.apache.org/jira/browse/PHOENIX-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058796#comment-16058796 ]
Samarth Jain commented on PHOENIX-3970: --------------------------------------- The way index rebuild works is by kicking off a select count(*) query on the data table with the index rebuild flag on. See this in UngroupedAggregateRegionObserver: {code} if (ScanUtil.isIndexRebuild(scan)) { return rebuildIndices(s, region, scan, env.getConfiguration()); } {code} This kicks off a raw scan which scans the data table and replays the mutations by replaying those mutations on the *data table*. For every such batch of "replay mutations", the indexer co-processor does the rebuild work. See LocalTableState#getCurrentRowState() {code} // need to use a scan here so we can get raw state, which Get doesn't provide. Scan s = IndexManagementUtil.newLocalStateScan(Collections.singletonList(columns)); s.setStartRow(row); s.setStopRow(row); if (ignoreNewerMutations) { // Provides a means of client indicating that newer cells should not be considered, // enabling mutations to be replayed to partially rebuild the index when a write fails. // When replaying mutations we want the oldest timestamp (as anything newer we be replayed) long ts = getOldestTimestamp(m.getFamilyCellMap().values()); s.setTimeRange(0,ts); } {code} Now, the replay of mutations on data table needs to be on a handler pool that is different from the handler pool doing indexer work. It looks like this patch is making the count(*) query also use the index handler pool which can cause deadlocks. > Ensure that automatic partial index rebuilds are served from the index > handler pool > ----------------------------------------------------------------------------------- > > Key: PHOENIX-3970 > URL: https://issues.apache.org/jira/browse/PHOENIX-3970 > Project: Phoenix > Issue Type: Bug > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Attachments: 3970.txt, 3970-v2.txt > > > This (and other issues) have rendered multiple larger cluster inoperable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)