wangweiming800 commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent … URL: https://github.com/apache/phoenix/pull/741#discussion_r398332277
########## File path: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java ########## @@ -614,38 +606,132 @@ private boolean isDeleteFamilyVersion(Mutation mutation) { return getMutationsWithSameTS(put, del); } + private void repairActualMutationList(List<Mutation> actualMutationList, List<Mutation> expectedMutationList) + throws IOException { + // find the first (latest) actual unverified put mutation + Mutation actual = null; + for (Mutation mutation : actualMutationList) { + if (mutation instanceof Put && !isVerified((Put) mutation)) { + actual = mutation; + break; + } + } + if (actual == null) { + return; + } + long ts = getTimestamp(actual); + int expectedIndex; + int expectedListSize = expectedMutationList.size(); + for (expectedIndex = 0; expectedIndex < expectedListSize; expectedIndex++) { + if (getTimestamp(expectedMutationList.get(expectedIndex)) <= ts) { + if (expectedIndex > 0) { + expectedIndex--; + } + break; + } + } + if (expectedIndex == expectedListSize) { + return; + } + for (; expectedIndex < expectedListSize; expectedIndex++) { + Mutation mutation = expectedMutationList.get(expectedIndex); + if (mutation instanceof Put) { + mutation = new Put((Put) mutation); + } else { + mutation = new Delete((Delete) mutation); + } + actualMutationList.add(mutation); + } + Collections.sort(actualMutationList, MUTATION_TS_DESC_COMPARATOR); + } + + private void cleanUpActualMutationList(List<Mutation> actualMutationList) Review comment: @kadirozde , I am thinking whether it is possible to replay the actual mutations by ignoring the unverified mutations and compare them with the expected mutations. If I understand correctly, there are only 4 types of mutations as a result of raw scan on index table. unverified put: it is introduced by failure on 2nd or 3rd phase data mutation, successful data row deletion or ongoing data mutation. verified put: it is introduced by successful data put mutation or read repair deleteFamily: it is introduced by successful data mutation which leads to the change or remove of the index row key. T deleteFamilyVersion: it is introduced by the read repair. Since verified put has the full information of the index mutation, even it is generated by the read repair. So if we replay all these 3 kinds of mutations by the time order, we are expected to get the expected mutation list. Do I misunderstand something here? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services