github-actions[bot] commented on code in PR #63182:
URL: https://github.com/apache/doris/pull/63182#discussion_r3237587292
##########
fe/fe-core/src/main/java/org/apache/doris/load/routineload/RoutineLoadJob.java:
##########
@@ -2017,6 +2022,9 @@ public void gsonPostProcess() throws IOException {
this.state = JobState.CANCELLED;
LOG.warn("error happens when parsing create routine load stmt: " +
origStmt.originStmt, e);
}
+ if (persistedColumnDescs != null) {
Review Comment:
This only restores the persisted `columnDescs` after reparsing `origStmt`;
the rest of the descriptor fields loaded from the image can still be
overwritten by the original CREATE descriptor at line 2017. For example, create
a routine load with `WHERE a > 0`, ALTER it to `WHERE a > 10`, then
checkpoint/restart: the image initially contains the altered `whereExpr`, but
`setRoutineLoadDesc(createRoutineLoadInfo.getRoutineLoadDesc())` reparses the
original statement and replaces it with `a > 0`. The same applies to
separators, preceding filters, partitions, delete condition, merge type, and
sequence column when the original statement had those fields. Please
preserve/reapply the full persisted descriptor state, not only `columnDescs`,
and cover image replay after ALTERing a non-column load descriptor.
##########
fe/fe-core/src/main/java/org/apache/doris/load/routineload/RoutineLoadJob.java:
##########
@@ -182,6 +182,7 @@ public boolean isFinalState() {
protected long authCode;
// protected RoutineLoadDesc routineLoadDesc; // optional
protected PartitionNamesInfo partitionNamesInfo; // optional
+ @SerializedName("cd")
Review Comment:
Adding a short `@SerializedName` without an alternate name breaks image
compatibility with existing checkpoints. Older images serialized this
unannotated field as `columnDescs`; after this change Gson only reads `cd`, so
`persistedColumnDescs` stays null and `gsonPostProcess()` falls back to the
`COLUMNS` clause from the original CREATE statement. On upgrade, any ALTERed
column mapping stored in the image can be lost before edit-log replay starts.
Please use `@SerializedName(value = "cd", alternate = "columnDescs")` or keep
the old field name, and add a legacy-image read test.
##########
be/src/exec/common/variant_util.cpp:
##########
@@ -2141,6 +2146,659 @@ phmap::flat_hash_map<std::string_view,
ColumnVariant::Subcolumn> materialize_doc
return subcolumns;
}
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_MASK = 1ULL << 63;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_CLASS_SHIFT = 62;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_UID_BITS = 31;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_INDEX_BITS = 11;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_POS_BITS = 12;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_BYTE_BITS = 8;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_POS_SHIFT =
VARIANT_PATCH_PATH_MARKER_BYTE_BITS;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_INDEX_SHIFT =
+ VARIANT_PATCH_PATH_MARKER_POS_SHIFT +
VARIANT_PATCH_PATH_MARKER_POS_BITS;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_UID_SHIFT =
+ VARIANT_PATCH_PATH_MARKER_INDEX_SHIFT +
VARIANT_PATCH_PATH_MARKER_INDEX_BITS;
+static_assert(VARIANT_PATCH_PATH_MARKER_UID_SHIFT +
VARIANT_PATCH_PATH_MARKER_UID_BITS ==
+ VARIANT_PATCH_PATH_MARKER_CLASS_SHIFT);
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_UID_MASK =
+ (1ULL << VARIANT_PATCH_PATH_MARKER_UID_BITS) - 1;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_INDEX_MASK =
+ (1ULL << VARIANT_PATCH_PATH_MARKER_INDEX_BITS) - 1;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_POS_MASK =
+ (1ULL << VARIANT_PATCH_PATH_MARKER_POS_BITS) - 1;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_BYTE_MASK =
+ (1ULL << VARIANT_PATCH_PATH_MARKER_BYTE_BITS) - 1;
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_MAX_COUNT = 1ULL
+ <<
VARIANT_PATCH_PATH_MARKER_INDEX_BITS;
+// Flexible VARIANT partial update keeps exact patch paths in skip bitmap
markers.
+// The byte position field is the feature-level encoded-path limit.
+constexpr uint64_t VARIANT_PATCH_PATH_MARKER_MAX_BYTES = 1ULL <<
VARIANT_PATCH_PATH_MARKER_POS_BITS;
+constexpr uint64_t VARIANT_PATCH_PATH_MAX_COUNT = 256;
+constexpr uint64_t VARIANT_PATCH_PATH_MAX_TOTAL_BYTES = 64 * 1024;
+
+// The hidden skip bitmap stores top-level column unique ids, so VARIANT patch
metadata uses
Review Comment:
These new high-bit values change the persisted meaning of the existing
skip-bitmap column, but there is no rowset/protocol version gate or BE
capability check before writing them. During a rolling upgrade, a new BE can
flush a flexible VARIANT rowset with these markers, while an older BE that
later publishes/compacts/reads the same partial-update data will iterate the
bitmap as plain column unique ids; old `FlexibleReadPlan::prepare_to_read()`
did not skip marker values and can build read-plan entries for invalid column
ids. Please gate VARIANT flexible partial update on all participating BEs
understanding the marker format, or store the path metadata in a
versioned/compatible field so older BEs reject it instead of misinterpreting it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]