github-actions[bot] commented on code in PR #63182:
URL: https://github.com/apache/doris/pull/63182#discussion_r3237587292


##########
fe/fe-core/src/main/java/org/apache/doris/load/routineload/RoutineLoadJob.java:
##########
@@ -2017,6 +2022,9 @@ public void gsonPostProcess() throws IOException {
             this.state = JobState.CANCELLED;
             LOG.warn("error happens when parsing create routine load stmt: " + 
origStmt.originStmt, e);
         }
+        if (persistedColumnDescs != null) {

Review Comment:
   This only restores the persisted `columnDescs` after reparsing `origStmt`; 
the rest of the descriptor fields loaded from the image can still be 
overwritten by the original CREATE descriptor at line 2017. For example, create 
a routine load with `WHERE a > 0`, ALTER it to `WHERE a > 10`, then 
checkpoint/restart: the image initially contains the altered `whereExpr`, but 
`setRoutineLoadDesc(createRoutineLoadInfo.getRoutineLoadDesc())` reparses the 
original statement and replaces it with `a > 0`. The same applies to 
separators, preceding filters, partitions, delete condition, merge type, and 
sequence column when the original statement had those fields. Please 
preserve/reapply the full persisted descriptor state, not only `columnDescs`, 
and cover image replay after ALTERing a non-column load descriptor.



##########
fe/fe-core/src/main/java/org/apache/doris/load/routineload/RoutineLoadJob.java:
##########
@@ -182,6 +182,7 @@ public boolean isFinalState() {
     protected long authCode;
     //    protected RoutineLoadDesc routineLoadDesc; // optional
     protected PartitionNamesInfo partitionNamesInfo; // optional
+    @SerializedName("cd")

Review Comment:
   Adding a short `@SerializedName` without an alternate name breaks image 
compatibility with existing checkpoints. Older images serialized this 
unannotated field as `columnDescs`; after this change Gson only reads `cd`, so 
`persistedColumnDescs` stays null and `gsonPostProcess()` falls back to the 
`COLUMNS` clause from the original CREATE statement. On upgrade, any ALTERed 
column mapping stored in the image can be lost before edit-log replay starts. 
Please use `@SerializedName(value = "cd", alternate = "columnDescs")` or keep 
the old field name, and add a legacy-image read test.



##########
be/src/exec/common/variant_util.cpp:
##########
@@ -2141,6 +2146,659 @@ phmap::flat_hash_map<std::string_view, 
ColumnVariant::Subcolumn> materialize_doc
     return subcolumns;
 }
 
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_MASK = 1ULL << 63;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_CLASS_SHIFT = 62;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_UID_BITS = 31;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_INDEX_BITS = 11;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_POS_BITS = 12;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_BYTE_BITS = 8;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_POS_SHIFT = 
VARIANT_PATCH_PATH_MARKER_BYTE_BITS;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_INDEX_SHIFT =
+        VARIANT_PATCH_PATH_MARKER_POS_SHIFT + 
VARIANT_PATCH_PATH_MARKER_POS_BITS;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_UID_SHIFT =
+        VARIANT_PATCH_PATH_MARKER_INDEX_SHIFT + 
VARIANT_PATCH_PATH_MARKER_INDEX_BITS;
+static_assert(VARIANT_PATCH_PATH_MARKER_UID_SHIFT + 
VARIANT_PATCH_PATH_MARKER_UID_BITS ==
+              VARIANT_PATCH_PATH_MARKER_CLASS_SHIFT);
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_UID_MASK =
+        (1ULL << VARIANT_PATCH_PATH_MARKER_UID_BITS) - 1;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_INDEX_MASK =
+        (1ULL << VARIANT_PATCH_PATH_MARKER_INDEX_BITS) - 1;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_POS_MASK =
+        (1ULL << VARIANT_PATCH_PATH_MARKER_POS_BITS) - 1;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_BYTE_MASK =
+        (1ULL << VARIANT_PATCH_PATH_MARKER_BYTE_BITS) - 1;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_MAX_COUNT = 1ULL
+                                                         << 
VARIANT_PATCH_PATH_MARKER_INDEX_BITS;
+// Flexible VARIANT partial update keeps exact patch paths in skip bitmap 
markers.
+// The byte position field is the feature-level encoded-path limit.
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_MAX_BYTES = 1ULL << 
VARIANT_PATCH_PATH_MARKER_POS_BITS;
+constexpr uint64_t VARIANT_PATCH_PATH_MAX_COUNT = 256;
+constexpr uint64_t VARIANT_PATCH_PATH_MAX_TOTAL_BYTES = 64 * 1024;
+
+// The hidden skip bitmap stores top-level column unique ids, so VARIANT patch 
metadata uses

Review Comment:
   These new high-bit values change the persisted meaning of the existing 
skip-bitmap column, but there is no rowset/protocol version gate or BE 
capability check before writing them. During a rolling upgrade, a new BE can 
flush a flexible VARIANT rowset with these markers, while an older BE that 
later publishes/compacts/reads the same partial-update data will iterate the 
bitmap as plain column unique ids; old `FlexibleReadPlan::prepare_to_read()` 
did not skip marker values and can build read-plan entries for invalid column 
ids. Please gate VARIANT flexible partial update on all participating BEs 
understanding the marker format, or store the path metadata in a 
versioned/compatible field so older BEs reject it instead of misinterpreting it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to