github-actions[bot] commented on code in PR #63894:
URL: https://github.com/apache/doris/pull/63894#discussion_r3335204642


##########
fe/fe-core/src/main/java/org/apache/doris/catalog/Partition.java:
##########
@@ -258,6 +265,43 @@ public List<MaterializedIndex> 
getMaterializedIndices(IndexExtState extState) {
         return indices;
     }
 
+    public String getMetaChecksum() {
+        MessageDigest digest = DigestUtils.getSha256Digest();
+        List<MaterializedIndex> indexes = 
getMaterializedIndices(IndexExtState.VISIBLE);
+        indexes.sort(Comparator.comparingLong(MaterializedIndex::getId));
+        for (MaterializedIndex index : indexes) {
+            updateMetaChecksum(digest, (byte) 1, index.getId());
+            List<Tablet> tablets = Lists.newArrayList(index.getTablets());
+            tablets.sort(Comparator.comparingLong(Tablet::getId));
+            for (Tablet tablet : tablets) {
+                updateMetaChecksum(digest, (byte) 2, tablet.getId());
+                List<Replica> replicas = 
Lists.newArrayList(tablet.getReplicas());
+                replicas.sort(Comparator.comparingLong(Replica::getId)
+                        
.thenComparingLong(Replica::getBackendIdWithoutException));
+                for (Replica replica : replicas) {

Review Comment:
   The checksum still misses several replica fields that directly affect remote 
query planning, so a cached remote partition can remain stale after those 
fields change without a visible version/time change. 
`Tablet.getQueryableReplicas()` filters on `replica.isBad()`, 
`checkVersionCatchUp(...)`/`getLastFailedVersion()`, `isUserDrop()`, 
`getPathHash()`, and `getState()`, but this digest only includes replica id and 
backend id. A concrete failure path is: the remote catalog cached a replica as 
normal; later the source FE marks that same replica bad or sets 
`lastFailedVersion` after a BE reports a missing tablet/version, while the 
partition visible version/time and replica backend stay unchanged; this 
checksum remains equal, `collectPartitionChanges()` removes the partition from 
`updatedPartitionIds`, and the remote planner can keep selecting the stale 
replica and hit the same missing-tablet/query failure this PR is meant to 
avoid. Please include all query-affecting replica metadata in 
 the checksum, and add a test where changing one of these fields forces a 
different checksum.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to