[
https://issues.apache.org/jira/browse/PHOENIX-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736641#comment-16736641
]
Kadir OZDEMIR commented on PHOENIX-5018:
----------------------------------------
While discussing this further with [~vincentpoon], [~gjacoby] and [~tdsilva] in
person, a third solution has emerged.
The third solution alternative is to change IndexTool to use the same code path
that MetaDataRegionObserver uses for partial index builds. This code path
leverages the doPostScannerOpen method of UngroupedRegionObserver rebuild
index. This method scans the data table to get mutations, and replays these
mutations back on the data table with the REPLAY_ONLY_INDEX_WRITES attribute on
the mutations. Indexer (the coprocessor for managing index updates) checks this
attribute and updates only the index tables for these mutations. By doing so
the index tables get the right timestamps. Thus, IndexTool can be changed to
leverage UngrouppedRegionObserver the same way MetaDataRegionObserver does.
This alternative achieves the code unification without loosing the benefits of
MapReduce framework.
> Index mutations created by IndexTool will have wrong timestamps
> ---------------------------------------------------------------
>
> Key: PHOENIX-5018
> URL: https://issues.apache.org/jira/browse/PHOENIX-5018
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.14.0, 5.0.0
> Reporter: Geoffrey Jacoby
> Assignee: Kadir OZDEMIR
> Priority: Major
>
> When doing a full rebuild (or initial async build) on an index using the
> IndexTool and PhoenixIndexImportDirectMapper, we generate the index mutations
> by creating an UPSERT SELECT query from the base table to the index, then
> taking the Mutations from it and inserting it directly into the index via an
> HBase HTable.
> The timestamps of the Mutations use the default HBase behavior, which is to
> take the current wall clock. However, the timestamp of an index KeyValue
> should use the timestamp of the initial KeyValue in the base table.
> Having base table and index timestamps out of sync can cause all sorts of
> weird side effects, such as if the base table has data with an expired TTL
> that isn't expired in the index yet.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)