github-actions[bot] commented on code in PR #63894:
URL: https://github.com/apache/doris/pull/63894#discussion_r3335204642
##########
fe/fe-core/src/main/java/org/apache/doris/catalog/Partition.java:
##########
@@ -258,6 +265,43 @@ public List<MaterializedIndex>
getMaterializedIndices(IndexExtState extState) {
return indices;
}
+ public String getMetaChecksum() {
+ MessageDigest digest = DigestUtils.getSha256Digest();
+ List<MaterializedIndex> indexes =
getMaterializedIndices(IndexExtState.VISIBLE);
+ indexes.sort(Comparator.comparingLong(MaterializedIndex::getId));
+ for (MaterializedIndex index : indexes) {
+ updateMetaChecksum(digest, (byte) 1, index.getId());
+ List<Tablet> tablets = Lists.newArrayList(index.getTablets());
+ tablets.sort(Comparator.comparingLong(Tablet::getId));
+ for (Tablet tablet : tablets) {
+ updateMetaChecksum(digest, (byte) 2, tablet.getId());
+ List<Replica> replicas =
Lists.newArrayList(tablet.getReplicas());
+ replicas.sort(Comparator.comparingLong(Replica::getId)
+
.thenComparingLong(Replica::getBackendIdWithoutException));
+ for (Replica replica : replicas) {
Review Comment:
The checksum still misses several replica fields that directly affect remote
query planning, so a cached remote partition can remain stale after those
fields change without a visible version/time change.
`Tablet.getQueryableReplicas()` filters on `replica.isBad()`,
`checkVersionCatchUp(...)`/`getLastFailedVersion()`, `isUserDrop()`,
`getPathHash()`, and `getState()`, but this digest only includes replica id and
backend id. A concrete failure path is: the remote catalog cached a replica as
normal; later the source FE marks that same replica bad or sets
`lastFailedVersion` after a BE reports a missing tablet/version, while the
partition visible version/time and replica backend stay unchanged; this
checksum remains equal, `collectPartitionChanges()` removes the partition from
`updatedPartitionIds`, and the remote planner can keep selecting the stale
replica and hit the same missing-tablet/query failure this PR is meant to
avoid. Please include all query-affecting replica metadata in
the checksum, and add a test where changing one of these fields forces a
different checksum.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]