JingsongLi commented on code in PR #8246:
URL: https://github.com/apache/paimon/pull/8246#discussion_r3419366461
##########
paimon-spark/paimon-spark-common/src/main/java/org/apache/paimon/spark/SparkCatalog.java:
##########
@@ -362,6 +369,36 @@ public org.apache.spark.sql.connector.catalog.Table
alterTable(
}
}
+ /**
+ * Detects whether the given changes originate from an {@code ALTER TABLE
... REPLACE COLUMNS}
+ * statement.
+ *
+ * <p>Spark translates {@code REPLACE COLUMNS} into a batch that drops
every existing column and
+ * re-adds the new set, i.e. a combination of {@link
TableChange.DeleteColumn} and {@link
+ * TableChange.AddColumn} only. Other column changes such as rename or
type update are never
+ * produced by {@code REPLACE COLUMNS}, so we match exclusively on these
two types to avoid
+ * mistaking a legitimate mixed batch (e.g. a programmatic DROP + RENAME)
for a replace.
+ *
+ * <p>This operation must be rejected because re-adding columns assigns
brand-new field ids
+ * while existing data files keep the old ids; same-named columns would
then be treated as new
+ * columns and read back as null, silently corrupting data.
+ */
+ private boolean isReplaceColumns(TableChange[] changes) {
+ boolean hasDeleteColumn = false;
+ boolean hasAddColumn = false;
+ for (TableChange change : changes) {
+ if (change instanceof TableChange.DeleteColumn) {
+ hasDeleteColumn = true;
+ } else if (change instanceof TableChange.AddColumn) {
+ hasAddColumn = true;
+ } else {
+ return false;
+ }
+ }
+
+ return hasDeleteColumn && hasAddColumn;
Review Comment:
This heuristic also rejects any programmatic `TableCatalog.alterTable` call
that batches a supported drop and add together, for example `deleteColumn("b")`
plus `addColumn("d", ...)`. That is not necessarily `ALTER TABLE ... REPLACE
COLUMNS` and it used to be a valid combination of existing schema changes. Can
we make the detection narrower, e.g. only reject Spark\s replace pattern where
all current top-level columns are deleted before the new columns are added, or
otherwise avoid blocking ordinary drop+add batches?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]