924060929 commented on code in PR #63612:
URL: https://github.com/apache/doris/pull/63612#discussion_r3315193467


##########
fe/fe-core/src/main/java/org/apache/doris/persist/OperationType.java:
##########
@@ -433,6 +433,11 @@ public class OperationType {
     public static final short OP_BEGIN_SNAPSHOT = 1100;
     public static final short OP_META_SYNC_POINT = 1101;
 
+    // Generic "an operation modified this table's metadata" signal broadcast 
from
+    // master to followers so that every FE-local cache keyed by table can be
+    // invalidated (NereidsSqlCacheManager, 
NereidsSortedPartitionsCacheManager, …).
+    public static final short OP_TABLE_META_CHANGE = 1102;

Review Comment:
   No existing `OperationType` matches the semantics of `OP_TABLE_META_CHANGE`. 
The closest candidates and why they don't fit:
   
   - `OP_REFRESH_EXTERNAL_TABLE` / `OP_REFRESH_CATALOG` / 
`OP_REFRESH_EXTERNAL_DB` — scoped to external catalogs by name. Repurposing 
them for internal OLAP tables would conflate "go re-fetch from upstream 
metadata source" with "your local table-keyed cache is stale", and existing 
handlers would need to branch on table type, which is worse than a new op.
   - `OP_MODIFY_TABLE_PROPERTIES` / `OP_MODIFY_TABLE_LIGHT_SCHEMA_CHANGE` / 
other `OP_MODIFY_TABLE_*` — each carries an operation-specific payload (the 
property bag, the schema change, etc.). Folding a generic "this table changed, 
invalidate downstream caches" into any one of them would require expanding its 
payload schema and would make replay handlers ambiguous about whether a real 
DDL happened.
   
   `OP_TABLE_META_CHANGE` is intentionally a single generic broadcast that any 
DDL on any table type can fan into, with a minimal payload (catalog/db/table 
ids+names). Subscribers (`NereidsSqlCacheManager`, 
`NereidsSortedPartitionsCacheManager`, and future per-table caches) all key off 
the same triplet, so one op carries them all. Keeping it separate avoids 
overloading existing ops.



##########
fe/fe-core/src/main/java/org/apache/doris/catalog/Env.java:
##########
@@ -7131,7 +7132,9 @@ public void setTableStatusInternal(String dbName, String 
tableName, OlapTableSta
                 LOG.warn("ignore set same state {} for table {}. is replay: 
{}.",
                             olapTable.getState(), tableName, isReplay);
             }
-            
Env.getCurrentEnv().getSqlCacheManager().invalidateAboutTable(olapTable);
+            if (!isReplay) {
+                notifyTableMetaChange(olapTable);

Review Comment:
   SQL cache invalidation is best-effort by design — every cache entry already 
goes through `tablesOrDataChanged` on lookup, which compares `visibleVersion` 
(Layer 1) and now `baseSchemaVersion` (Layer 2) against the live table. These 
two layers cover the correctness-critical surface (data writes + schema DDLs) 
without any dependency on `OP_TABLE_META_CHANGE` and without any crash / 
replay-format / version-skew window.
   
   Layer 3 (`OP_TABLE_META_CHANGE`) only adds coverage for DDLs that do not 
bump either version — alter properties, alter partition status, etc. The 
rolling-upgrade case you describe affects exactly this Layer-3-only set: during 
the upgrade, the new follower replays an old master's journal that lacks the 
companion record, and entries for those rarer DDLs may remain in cache until 
the next lookup naturally evicts them or until the master finishes upgrading 
and resumes emitting `OP_TABLE_META_CHANGE`. The window is bounded by the 
upgrade duration, and Layer 1 + Layer 2 still hold throughout, so we accept 
this gap rather than keeping the unconditional replay-side invalidation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to