wangweiming800 commented on a change in pull request #741: PHOENIX-5791
Eliminate false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r397523167
##########
File path:
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
##########
@@ -614,38 +606,132 @@ private boolean isDeleteFamilyVersion(Mutation
mutation) {
return getMutationsWithSameTS(put, del);
}
+ private void repairActualMutationList(List<Mutation> actualMutationList,
List<Mutation> expectedMutationList)
+ throws IOException {
+ // find the first (latest) actual unverified put mutation
+ Mutation actual = null;
+ for (Mutation mutation : actualMutationList) {
+ if (mutation instanceof Put && !isVerified((Put) mutation)) {
+ actual = mutation;
+ break;
+ }
+ }
+ if (actual == null) {
+ return;
+ }
+ long ts = getTimestamp(actual);
+ int expectedIndex;
+ int expectedListSize = expectedMutationList.size();
+ for (expectedIndex = 0; expectedIndex < expectedListSize;
expectedIndex++) {
+ if (getTimestamp(expectedMutationList.get(expectedIndex)) <= ts) {
+ if (expectedIndex > 0) {
+ expectedIndex--;
+ }
+ break;
+ }
+ }
+ if (expectedIndex == expectedListSize) {
+ return;
+ }
+ for (; expectedIndex < expectedListSize; expectedIndex++) {
+ Mutation mutation = expectedMutationList.get(expectedIndex);
+ if (mutation instanceof Put) {
+ mutation = new Put((Put) mutation);
+ } else {
+ mutation = new Delete((Delete) mutation);
+ }
+ actualMutationList.add(mutation);
+ }
+ Collections.sort(actualMutationList, MUTATION_TS_DESC_COMPARATOR);
+ }
+
+ private void cleanUpActualMutationList(List<Mutation> actualMutationList)
+ throws IOException {
+ Iterator<Mutation> iterator = actualMutationList.iterator();
+ Mutation previous = null;
+ while (iterator.hasNext()) {
+ Mutation mutation = iterator.next();
+ if ((mutation instanceof Put && !isVerified((Put) mutation)) ||
+ (mutation instanceof Delete &&
isDeleteFamilyVersion(mutation))) {
+ iterator.remove();
+ } else {
+ if (previous != null && getTimestamp(previous) ==
getTimestamp(mutation) &&
+ ((previous instanceof Put && mutation instanceof Put)
||
+ previous instanceof Delete && mutation
instanceof Delete)) {
+ iterator.remove();
+ } else {
+ previous = mutation;
+ }
+ }
+ }
+ }
+
/**
- * indexRow is the set of all cells of all the row version of an index row
from the index table. These are actual
- * cells. We group these cells based on timestamp and type (put vs
delete), and form the actual set of
- * index mutations. indexKeyToMutationMap is a map from an index row key
to a set of mutations that are generated
- * using the rebuild process (i.e., by replaying raw data table
mutations). These sets are sets of expected
- * index mutations, one set for each index row key. Since not all
mutations in the index table have both phase
- * (i.e., pre and post data phase) mutations, we cannot compare actual
index mutations with expected one by one
- * and expect to find them identical. We need to consider concurrent data
mutation effects, data table row write
- * failures, post index write failures. Thus, we need to allow some
expected and actual mutations to be skipped
- * during comparing actual mutations to index mutations.
+ * There are two types of verification: without repair and with repair.
Without-repair verification is done before
+ * or after index rebuild. It is done before index rebuild to identify the
rows to be rebuilt. It is done after
+ * index rebuild to verify the rows that have been rebuilt. With-repair
verification can be done anytime using
+ * the “-v ONLY” option to check the consistency of the index table.
+ *
+ * Unverified Rows
+ *
+ * For each mutable data table mutation during regular data table updates,
two operations are done on the data table.
+ * One is to read the existing row state, and the second is to update the
data table for this row. The processing of
+ * concurrent data mutations are serialized once for reading the existing
row states, and then serialized again
+ * for updating the data table. In other words, they go through locking
twice, i.e., [lock, read, unlock] and
+ * [lock, write, unlock]. Because of this two phase locking, for a pair of
concurrent mutations (for the same row),
+ * the same row state can be read from the data table. This means the same
existing index row can be made unverified
+ * twice with different timestamps, one for each concurrent mutation.
These unverified mutations can be repaired
+ * from the data table later during HBase scans using the index read
repair process. This is one of the reasons
+ * for having extra unverified rows in the index table. The other reason
is the data table write failures.
+ * When a data table write fails, it leaves an unverified index row
behind. These rows are never returned to clients,
+ * instead they are repaired, which means either they are rebuilt from
their data table rows or they are deleted if
+ * their data table rows do not exist.
+ *
+ * Delete Family Version Markers
+ *
+ * The family version delete markers are generated by the read repair to
remove extra unverified rows. They only
+ * show up in the actual mutation list since they are not generated for
regular table updates or index rebuilds.
+ * For the verification purpose, these delete markers can be treated as
extra unverified rows and can be safely
+ * skipped.
+ *
+ * Delete Family Markers
+ * Delete family markers are generated during read repair, regular table
updates and index rebuilds to delete index
+ * table rows. The read repair generates them to delete extra unverified
rows. During regular table updates or
+ * index rebuilds, the delete family markers are used to delete index rows
due to data table row deletes or
+ * data table row overwrites.
+ *
+ * Verification Algorithm
+ *
+ * IndexTool verification generates an expected list of index mutations
from the data table rows and uses this list
+ * to check if index table rows are consistent with the data table.
*
- * The main idea for the verification algorithm used here is to match
every expected verified put with an actual
- * put such that these two mutations are the same except that actual
mutation can be unverified.
+ * The expect list is generated using the index rebuild algorithm. This
mean for a given row, the list can include
+ * a number of put and delete mutations such that the followings hold:
*
- * Some background on why we can skip some of the actual unverified puts
and delete markers due to concurrent data
- * table updates is as follows:
+ * Every mutation will include a set of cells with the same timestamp
+ * Every mutation has a different timestamp
+ * A delete mutation will include only delete family cells and it is for
deleting the entire row and its versions
+ * Every put mutation is verified
*
- * For each data table mutation, two operations are done on the data
table. One is to read the existing row state,
- * and the second is to write to the data table. The processing of
concurrent data mutations are serialized once
- * for reading the existing row states, and then serialized again for
updating data table. In other words,
- * they go through locking twice, i.e., [lock, read, unlock] and [lock,
write, unlock]. Because of this two phase
- * locking, for a pair of concurrent mutations (for the same row), the
same row state can be read from the data
- * table. This means the same existing index row can be made unverified
twice with different timestamps, one for
- * each concurrent mutation. These unverified mutation are then repaired
from the data table. Since expected
- * mutations are used for rebuild (which is also used by the read repair),
skipping these unverified put mutations
- * that are not matching with expected mutation are safe as they will go
through the same process during
- * read repair and will be skipped and eventually cleaned up by the read
repair. We can skip the delete markers
- * safely too as they are placed to clean up these unverified mutations.
When the data table rows are rebuilt,
- * the rebuild process generates the delete family markers. The timestamp
of delete markers are the timestamp of
- * the data table mutation for which the delete marker is added. Thus, the
timestamp of these delete markers will be
- * higher than the timestamp of index row to be deleted.
+ * For both verification types, after the expected list of index mutations
is constructed for a given data table,
+ * another list called the actual list of index mutations is constructed
by reading the index table row using HBase
+ * raw scan and all versions of the cells of the row are retrieved.
+ *
+ * As in the construction for the expected list, the cells are grouped
into a put and a delete set. The put and
+ * delete sets for a given row are further grouped based on their
timestamps into put and delete mutations such that
+ * all the cells in a mutation have the timestamps. The put and delete
mutations are then sorted within a single
+ * list. Mutations in this list are sorted in ascending order of their
timestamp. This list is the actual list.
+ *
+ * For the without-repair verification, unverified mutations and family
version delete markers are removed from
Review comment:
will it remove the innocent delete markers as my previous comment?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services