jacklong319 commented on PR #7491:
URL: https://github.com/apache/paimon/pull/7491#issuecomment-4101381624

   I also hit this issue in a real cross-partition upsert job.
   
   Scenario:
   - existing dynamic bucket table
   - `bucket = -1`
   - `dynamic-bucket.initial-buckets` was changed after the table had already 
contained data
   - after restarting the Flink job, checkpoint failed during bootstrap of 
`cross-partition-bucket-assigner`
   
   It seems that changing `dynamic-bucket.initial-buckets` on an existing table 
may break dynamic bucket assigner / bootstrap semantics, and finally cause 
duplicate-key related failure during bulk load.
   
   The stack trace is like this:
   
   `java.io.IOException: Could not perform checkpoint 3 for operator 
cross-partition-bucket-assigner (23/128)#0.
   ...
   Caused by: java.lang.RuntimeException: Exception in bulkLoad, the most 
suspicious reason is that your data contains duplicates, please check your sink 
table. (The likelihood of duplication is that you used multiple jobs to write 
the same dynamic bucket table, it only supports single write)
       at 
org.apache.paimon.crosspartition.GlobalIndexAssigner.endBoostrapWithoutEmit(GlobalIndexAssigner.java:225)
       at 
org.apache.paimon.crosspartition.GlobalIndexAssigner.endBoostrap(GlobalIndexAssigner.java:200)
       at 
org.apache.paimon.flink.sink.index.GlobalIndexAssignerOperator.endBootstrap(GlobalIndexAssignerOperator.java:95)
       at 
org.apache.paimon.flink.sink.index.GlobalIndexAssignerOperator.prepareSnapshotPreBarrier(GlobalIndexAssignerOperator.java:85)
   ...
   Caused by: org.rocksdb.RocksDBException: Keys must be added in strict 
ascending order.
       at org.rocksdb.SstFileWriter.put(Native Method)
       at org.rocksdb.SstFileWriter.put(SstFileWriter.java:150)
       at 
org.apache.paimon.lookup.rocksdb.RocksDBBulkLoader.write(RocksDBBulkLoader.java:83)
       at 
org.apache.paimon.crosspartition.GlobalIndexAssigner.endBoostrapWithoutEmit(GlobalIndexAssigner.java:217)`
   So I think dynamic-bucket.initial-buckets should be treated as an immutable 
option for existing tables, instead of allowing the change and only failing 
later at runtime.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to