[
https://issues.apache.org/jira/browse/PHOENIX-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280249#comment-15280249
]
James Taylor edited comment on PHOENIX-2890 at 8/16/16 1:03 AM:
----------------------------------------------------------------
Partial index rebuild is different because it needs to "replay" all data table
mutations by doing a raw scan (see the code comments). This is necessary
because the incrememtal index maintanence failed. The IndexTool only needs to
generate the index rows for existing data rows as incrememtal index maintenance
is being done on new data coming in.
One simple example might make this more clear:
- Disable an index at t1.
- A row in the table is deleted at t2
- Partial rebuild is done as of t1 (since that's when index was disabled)
The problem is that our create index mechanisms are done at the Phoenix-level
by issuing a scan and the scan won't see the delete marker at t2. It needs to
see it so that it can correctly issue the delete of the corresponding index row.
FWIW, the MetaDataRegionObserver.BuildIndexScheduleTask handles the above
correctly. It's pretty close to what would be needed for a partial index build.
was (Author: jamestaylor):
Partial index rebuild is different because it needs to "replay" all data table
mutations by doing a raw scan (see the code comments). This is necessary
because the incrememtal index maintanence failed. The IndexTool only needs to
generate the index rows for existing data rows as incrememtal index maintenance
is being done on new data coming in.
> Extend IndexTool to allow incremental index rebuilds
> ----------------------------------------------------
>
> Key: PHOENIX-2890
> URL: https://issues.apache.org/jira/browse/PHOENIX-2890
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Ankit Singhal
> Assignee: Ankit Singhal
> Priority: Minor
> Fix For: 4.9.0
>
> Attachments: PHOENIX-2890_wip.patch
>
>
> Currently , IndexTool is used for initial index rebuild but I think we should
> extend it to be used for recovering index from last disabled timestamp too.
> In general terms if we run IndexTool on already existing/new index, then it
> should follow the same semantics as followed by background Index rebuilding
> thread.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)