[ 
https://issues.apache.org/jira/browse/PHOENIX-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685980#comment-16685980
 ] 

Vincent Poon commented on PHOENIX-5018:
---------------------------------------

Pretty sure this causes index out of sync issues - at least in certain edge 
cases (feels like it could also be a more general issue, but I haven't thought 
of exactly how yet).
Taking updates to preexisting rows while running an index build, for example, 
would be problematic:
Insert row R with v1 at t1
Create index async at t2.  IndexTool will get a current SCN of t2 (it uses 
latest index table time as of when the tool is run).
While tool is running, update R at t3 to v2.  This will issue an index Delete 
of v1_R and a new Put of index row v2_R, both with a timestamp of t3.
At t4, the IndexTool reads data table row R with v1 (using currentSCN).  It 
then issues a Put of index row v1_R with a timestamp of t4.
You now have two index rows:
v2_R at t3
v1_R at t4


> Index mutations created by IndexTool will have wrong timestamps
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-5018
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5018
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.14.0, 5.0.0
>            Reporter: Geoffrey Jacoby
>            Assignee: Geoffrey Jacoby
>            Priority: Major
>
> When doing a full rebuild (or initial async build) on an index using the 
> IndexTool and PhoenixIndexImportDirectMapper, we generate the index mutations 
> by creating an UPSERT SELECT query from the base table to the index, then 
> taking the Mutations from it and inserting it directly into the index via an 
> HBase HTable. 
> The timestamps of the Mutations use the default HBase behavior, which is to 
> take the current wall clock. However, the timestamp of an index KeyValue 
> should use the timestamp of the initial KeyValue in the base table.
> Having base table and index timestamps out of sync can cause all sorts of 
> weird side effects, such as if the base table has data with an expired TTL 
> that isn't expired in the index yet. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to