[
https://issues.apache.org/jira/browse/PHOENIX-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Geoffrey Jacoby updated PHOENIX-5604:
-------------------------------------
Summary: Index rebuilds and read repairs should not skip WAL (was: Index
rebuilds should not skip WAL)
> Index rebuilds and read repairs should not skip WAL
> ---------------------------------------------------
>
> Key: PHOENIX-5604
> URL: https://issues.apache.org/jira/browse/PHOENIX-5604
> Project: Phoenix
> Issue Type: Bug
> Reporter: Geoffrey Jacoby
> Assignee: Geoffrey Jacoby
> Priority: Major
>
> Currently both Index read repairs and IndexTool build/rebuilds in the new
> design continue to skip the WAL, following the same pattern the old Indexer
> used. However, there are key differences between the old and new logic that
> make this no longer the correct choice.
> First, recall that all HBase replication is based on tailing the WAL, and
> that any transaction that skips the WAL doesn't get replicated.
> In the old logic, the data table write (and WAL append) would be accompanied
> by an IndexedKeyValue which would contain enough information to reconstitute
> the index edit in the event of a failure before the index edit could be
> committed. So skipping the WAL during recovery was _potentially_ OK, because
> writing to the WAL would be redundant locally. (But that still seems to me
> wrong in a case with replication, since I don't believe IndexedKeyValues are
> replicated, since they use the "magic" METAFAMILY cf.)
> In the new logic, on a normal write, we write to the index first (which will
> go into a WAL), then the data table (into a potentially different RS's WAL),
> and lastly the verified flag flip into the Index, into the original index
> write's WAL. If something goes wrong with stage 2 or 3, read repair will fix
> it, but if the repair action – whether a put or delete – doesn't go into the
> WAL, a DR buddy of the index will be out of sync.
> This is even more important on an async initial build of an index, where if I
> understand right, there is no WAL append for the index write at all in the
> current UngroupedAggregateRegionObserver rebuild logic. The same would be the
> case of a rebuild of a new-style index in the event of non-Phoenix related
> corruption (such as HDFS or raw HBase level).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)