Copilot commented on code in PR #60705:
URL: https://github.com/apache/doris/pull/60705#discussion_r2797003852
##########
fe/fe-core/src/main/java/org/apache/doris/master/Checkpoint.java:
##########
@@ -146,6 +154,7 @@ public synchronized void doCheckpoint() throws
CheckpointException {
checkPointVersion,
env.getReplayedJournalId()));
}
env.postProcessAfterMetadataReplayed(false);
+ postProcessCloudMetadata();
latestImageFilePath = env.saveImage();
Review Comment:
This introduces new cloud-mode checkpoint behavior
(`postProcessCloudMetadata`) that mutates the replayed catalog before
`saveImage()`. There are existing tests that exercise checkpoint
serialization/deserialization, but nothing here asserts that table cached
versions / partition versions / replica stats are actually persisted and
restored correctly in cloud mode. Please add a unit/integration test that
creates an OlapTable + replicas, sets these fields, runs checkpoint image
serialize/deserialize, and verifies the values survive.
##########
fe/fe-core/src/main/java/org/apache/doris/catalog/OlapTable.java:
##########
@@ -240,6 +240,7 @@ public enum OlapTableState {
// Cache for table version in cloud mode
// This value is set when get the table version from meta-service, 0 means
version is not cached yet
private volatile long lastTableVersionCachedTimeMs = 0;
+ @SerializedName(value = "cv")
private volatile long cachedTableVersion = -1;
Review Comment:
`cachedTableVersion` is now serialized, but `lastTableVersionCachedTimeMs`
is still not serialized (and will reset to 0 when loading from image). That
makes the cache look immediately expired after restart, so
`getVisibleVersion()` will still go to meta-service and the persisted
`cachedTableVersion` likely won’t be used. Consider either persisting
`lastTableVersionCachedTimeMs` as well, or (preferred) initializing
`lastTableVersionCachedTimeMs` during gson post-process when a non-(-1) cached
version is loaded so the cache is treated as fresh at image load time.
##########
fe/fe-core/src/main/java/org/apache/doris/master/Checkpoint.java:
##########
@@ -395,4 +404,99 @@ private long getMemoryUsedPercent() {
public ReentrantReadWriteLock getLock() {
return lock;
}
+
+ private void postProcessCloudMetadata() {
+ if (Config.isNotCloudMode()) {
+ return;
+ }
+ Env servingEnv = Env.getServingEnv();
+ if (servingEnv == null) {
+ LOG.warn("serving env is null, skip process cloud metadata for
checkpoint");
+ return;
+ }
+ long start = System.currentTimeMillis();
+ for (Database db : env.getInternalCatalog().getDbs()) {
+ Database servingDb =
servingEnv.getInternalCatalog().getDbNullable(db.getId());
+ if (servingDb == null) {
+ LOG.warn("serving db is null. dbId: {}, dbName: {}",
db.getId(), db.getFullName());
+ continue;
+ }
Review Comment:
The checkpoint env is replayed only up to `checkPointVersion`, so it’s
expected that some db/table/partition/index objects may not exist (or may
differ) in `servingEnv` due to concurrent DDLs after the checkpoint snapshot
point. Logging each mismatch at WARN can create noisy logs during normal
operation; consider downgrading these to DEBUG or aggregating counts and
logging a single WARN/INFO summary per checkpoint.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]