Geoffrey Jacoby created PHOENIX-5604:
----------------------------------------

             Summary: Index rebuilds should not skip WAL
                 Key: PHOENIX-5604
                 URL: https://issues.apache.org/jira/browse/PHOENIX-5604
             Project: Phoenix
          Issue Type: Bug
            Reporter: Geoffrey Jacoby
            Assignee: Geoffrey Jacoby


Currently both Index read repairs and IndexTool build/rebuilds in the new 
design continue to skip the WAL, following the same pattern the old Indexer 
used. However, there are key differences between the old and new logic that 
make this no longer the correct choice.

First, recall that all HBase replication is based on tailing the WAL, and that 
any transaction that skips the WAL doesn't get replicated. 

In the old logic, the data table write (and WAL append) would be accompanied by 
an IndexedKeyValue which would contain enough information to reconstitute the 
index edit in the event of a failure before the index edit could be committed. 
So skipping the WAL during recovery was _potentially_ OK, because writing to 
the WAL would be redundant locally. (But that still seems to me wrong in a case 
with replication, since I don't believe IndexedKeyValues are replicated, since 
they use the "magic" METAFAMILY cf.)  

In the new logic, on a normal write, we write to the index first (which will go 
into a WAL), then the data table (into a potentially different RS's WAL), and 
lastly the verified flag flip into the Index, into the original index write's 
WAL. If something goes wrong with stage 2 or 3, read repair will fix it, but if 
the repair action – whether a put or delete – doesn't go into the WAL, a DR 
buddy of the index will be out of sync. 

This is even more important on an async initial build of an index, where if I 
understand right, there is no WAL append for the index write at all in the 
current UngroupedAggregateRegionObserver rebuild logic. The same would be the 
case of a rebuild of a new-style index in the event of non-Phoenix related 
corruption (such as HDFS or raw HBase level). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to