kadirozde commented on a change in pull request #592: PHOENIX-5505 Index read
repair does not repair unverified rows with h…
URL: https://github.com/apache/phoenix/pull/592#discussion_r330936826
##########
File path:
phoenix-core/src/main/java/org/apache/phoenix/index/GlobalIndexChecker.java
##########
@@ -266,25 +277,55 @@ private void repairIndexRows(byte[] indexRowKey, long
ts, List<Cell> row) throws
if (row.isEmpty()) {
return;
}
- // Check if the corresponding data table row exist
- if (Bytes.compareTo(row.get(0).getRowArray(),
row.get(0).getRowOffset(), row.get(0).getRowLength(),
+ boolean indexRowExists = false;
+ // Check if the index row still exist after rebuild
+ while (Bytes.compareTo(row.get(0).getRowArray(),
row.get(0).getRowOffset(), row.get(0).getRowLength(),
indexRowKey, 0, indexRowKey.length) == 0) {
- if (!verifyRowAndRemoveEmptyColumn(row)) {
- // The corresponding row does not exist in the data table.
- // Need to delete the row from index if it is old enough
- deleteRowIfAgedEnough(indexRowKey, ts);
- row.clear();
+ indexRowExists = true;
+
+ if (verifyRowAndRemoveEmptyColumn(row)) {
+ // The index row status is "verified". This row is good to
return to the client. We are done here.
+ return;
}
- return;
+ // The index row is still "unverified" after rebuild. This
means either that the data table row timestamp is
+ // lower than than the timestamp of the unverified index row
(ts) and the index row that is built from
+ // the data table row is masked by this unverified row, or
that the corresponding data table row does
+ // exist
+ // First delete the unverified row from index if it is old
enough
+ deleteRowIfAgedEnough(indexRowKey, row, ts);
+ // Now we will do a single row scan to retrieve the verified
index row build from the data table row
Review comment:
It does not seem it is easy to get all versions in one scan. Phoenix tables
are set with one row version currently. For them, readAllVersions() returns
only one version. The readAllVersions() with setRaw(true) combination seems
return all versions on disk but it also returns delete markers too. (Also I am
not sure this is a consistent behavior for all HBase versions as I only checked
it for the master branch.) This makes the code complex. Given that we are
dealing with a specific failure case (data table write failure when overriding
an existing row) , I do not think it is a good idea to optimize the code
handling it
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services