Re: [PR] [lake] Support schema evolution for lake-enabled tables with AddColum… [fluss]

via GitHub Tue, 16 Dec 2025 22:33:37 -0800


buvb commented on code in PR #2189:
URL: https://github.com/apache/fluss/pull/2189#discussion_r2625775464



##########
fluss-lake/fluss-lake-paimon/src/main/java/org/apache/fluss/lake/paimon/tiering/FlussRecordAsPaimonRow.java:
##########
@@ -27,35 +27,38 @@
 
 import static org.apache.fluss.lake.paimon.PaimonLakeCatalog.SYSTEM_COLUMNS;
 import static org.apache.fluss.lake.paimon.utils.PaimonConversions.toRowKind;
-import static org.apache.fluss.utils.Preconditions.checkState;
 
 /** To wrap Fluss {@link LogRecord} as paimon {@link InternalRow}. */
 public class FlussRecordAsPaimonRow extends FlussRowAsPaimonRow {
 
     private final int bucket;
     private LogRecord logRecord;
     private int originRowFieldCount;
+    private final int businessFieldCount;
+    private final int bucketFieldIndex;
+    private final int offsetFieldIndex;
+    private final int timestampFieldIndex;
 
     public FlussRecordAsPaimonRow(int bucket, RowType tableTowType) {
         super(tableTowType);
         this.bucket = bucket;
+        this.businessFieldCount = tableRowType.getFieldCount() - 
SYSTEM_COLUMNS.size();
+        this.bucketFieldIndex = businessFieldCount;
+        this.offsetFieldIndex = businessFieldCount + 1;
+        this.timestampFieldIndex = businessFieldCount + 2;
     }
 
     public void setFlussRecord(LogRecord logRecord) {
         this.logRecord = logRecord;
         this.internalRow = logRecord.getRow();
-        this.originRowFieldCount = internalRow.getFieldCount();
-        checkState(
-                originRowFieldCount == tableRowType.getFieldCount() - 
SYSTEM_COLUMNS.size(),
-                "The paimon table fields count must equals to LogRecord's 
fields count.");
+        this.originRowFieldCount = Math.min(internalRow.getFieldCount(), 
businessFieldCount);

Review Comment:
    About [businessFieldCount < internalRow.getFieldCount()]
    I understand your concern, but this scenario CAN happen during the 
transition period:
    **Timeline:**
   1. User executes `ALTER TABLE ADD COLUMN c3`
   2. Fluss schema updated (now has c3)
   3. New Fluss records are written with c3
   4. Tiering starts consuming these new records
   5. But [syncSchemaChangesToLake()] hasn't completed yet → Paimon schema 
still doesn't have c3
   
   In this case, [internalRow.getFieldCount() > businessFieldCount]
   If we throw an exception here, tiering will crash during the transition.
   Using `Math.min()` to ignore the extra field is safe because:
   - The data is not "lost" - it's still in Fluss
   - Once Paimon schema syncs, subsequent tiering will include the new column
   - This is a temporary transition state, not a permanent data loss



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [lake] Support schema evolution for lake-enabled tables with AddColum… [fluss]

Reply via email to