gjacoby126 commented on a change in pull request #592: PHOENIX-5505 Index read 
repair does not repair unverified rows with h…
URL: https://github.com/apache/phoenix/pull/592#discussion_r330815036
 
 

 ##########
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/index/GlobalIndexChecker.java
 ##########
 @@ -266,25 +277,55 @@ private void repairIndexRows(byte[] indexRowKey, long 
ts, List<Cell> row) throws
             if (row.isEmpty()) {
                 return;
             }
-            // Check if the corresponding data table row exist
-            if (Bytes.compareTo(row.get(0).getRowArray(), 
row.get(0).getRowOffset(), row.get(0).getRowLength(),
+            boolean indexRowExists = false;
+            // Check if the index row still exist after rebuild
+            while (Bytes.compareTo(row.get(0).getRowArray(), 
row.get(0).getRowOffset(), row.get(0).getRowLength(),
                     indexRowKey, 0, indexRowKey.length) == 0) {
-                if (!verifyRowAndRemoveEmptyColumn(row)) {
-                    // The corresponding row does not exist in the data table.
-                    // Need to delete the row from index if it is old enough
-                    deleteRowIfAgedEnough(indexRowKey, ts);
-                    row.clear();
+                indexRowExists = true;
+
+                if (verifyRowAndRemoveEmptyColumn(row)) {
+                    // The index row status is "verified". This row is good to 
return to the client. We are done here.
+                    return;
                 }
-                return;
+                // The index row is still "unverified" after rebuild. This 
means either that the data table row timestamp is
+                // lower than than the timestamp of the unverified index row 
(ts) and the index row that is built from
+                // the data table row is masked by this unverified row, or 
that the corresponding data table row does
+                // exist
+                // First delete the unverified row from index if it is old 
enough
+                deleteRowIfAgedEnough(indexRowKey, row, ts);
+                // Now we will do a single row scan to retrieve the verified 
index row build from the data table row
+                // if such an index row exists
+                singleRowIndexScan.withStartRow(indexRowKey, true);
+                singleRowIndexScan.withStopRow(indexRowKey, true);
+                // ts - 1 is to exclude the current unverified index row
+                singleRowIndexScan.setTimeRange(minTimestamp, ts - 1);
+                RegionScanner singleRowScanner = 
region.getScanner(singleRowIndexScan);
+                row.clear();
+                singleRowScanner.next(row);
+                singleRowScanner.close();
+                if (row.isEmpty()) {
+                    // This means that the data row did not exist, so we need 
to skip this unverified row (i.e., do not
+                    // return it to the client). Just retuning empty row is 
sufficient to do that
+                    return;
+                }
+                ts = getMaxTimestamp(row);
             }
-            // This means the current index row is deleted by the rebuild 
process and we got the next row.
-            // If it is verified then we are good to go. If not, then we need 
to repair the new row
-            if (!verifyRowAndRemoveEmptyColumn(row)) {
-                // Rewind the scanner and let the row be scanned again so that 
it can be repaired
-                scanner.close();
-                scanner = region.getScanner(indexScan);
+            if (indexRowExists) {
+                // This means there does not exist a data row for the 
unverified index row. Skip this row. To do that
 
 Review comment:
   if we found a previous, verified version up in singleRowIndexScan don't we 
need to return that instead? I assume "scanner"'s not raw so it would miss it 
if we just go next() elsewhere, right? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to