[
https://issues.apache.org/jira/browse/PHOENIX-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kadir OZDEMIR updated PHOENIX-5018:
-----------------------------------
Description:
When doing a full rebuild (or initial async build) of a local or global index
using IndexTool and PhoenixIndexImportDirectMapper, or doing a synchronous
initial build of a global index using the index create DDL, we generate the
index mutations by using an UPSERT SELECT query from the base table to the
index.
The timestamps of the mutations use the default HBase behavior, which is to
take the current wall clock. However, the timestamp of an index KeyValue should
use the timestamp of the initial KeyValue in the base table.
Having base table and index timestamps out of sync can cause all sorts of weird
side effects, such as if the base table has data with an expired TTL that isn't
expired in the index yet. Also inserting old mutations with new timestamps may
overwrite the data that has been newly overwritten by the regular data path
during index build, which would lead to data loss and inconsistency issues.
was:
When doing a full rebuild (or initial async build) on an index using the
IndexTool and PhoenixIndexImportDirectMapper, we generate the index mutations
by creating an UPSERT SELECT query from the base table to the index, then
taking the Mutations from it and inserting it directly into the index via an
HBase HTable.
The timestamps of the Mutations use the default HBase behavior, which is to
take the current wall clock. However, the timestamp of an index KeyValue should
use the timestamp of the initial KeyValue in the base table.
Having base table and index timestamps out of sync can cause all sorts of weird
side effects, such as if the base table has data with an expired TTL that isn't
expired in the index yet.
> Index mutations created by UPSERT SELECT will have wrong timestamps
> -------------------------------------------------------------------
>
> Key: PHOENIX-5018
> URL: https://issues.apache.org/jira/browse/PHOENIX-5018
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.14.0, 5.0.0
> Reporter: Geoffrey Jacoby
> Assignee: Kadir OZDEMIR
> Priority: Major
>
> When doing a full rebuild (or initial async build) of a local or global index
> using IndexTool and PhoenixIndexImportDirectMapper, or doing a synchronous
> initial build of a global index using the index create DDL, we generate the
> index mutations by using an UPSERT SELECT query from the base table to the
> index.
> The timestamps of the mutations use the default HBase behavior, which is to
> take the current wall clock. However, the timestamp of an index KeyValue
> should use the timestamp of the initial KeyValue in the base table.
> Having base table and index timestamps out of sync can cause all sorts of
> weird side effects, such as if the base table has data with an expired TTL
> that isn't expired in the index yet. Also inserting old mutations with new
> timestamps may overwrite the data that has been newly overwritten by the
> regular data path during index build, which would lead to data loss and
> inconsistency issues.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)