xiearthur commented on issue #12523:
URL: https://github.com/apache/hudi/issues/12523#issuecomment-2558824552

   Title: Flink-Hudi COW Table Write Fails with Bucket Index While MOR Works 
Fine
   Description:
   We encountered an issue with Flink-Hudi when writing to COW tables using 
bucket index. The write operation fails during checkpoint while the same 
configuration works perfectly with MOR tables.
   Issue Details:
   
   The Flink job generates parquet files in buckets but fails to commit them
   Files under .hoodie directory show rollback operations
   Unable to read the data written by Flink
   Generated parquet files disappear after job restart
   
   Error Log:
   CopyIOException: Could not perform checkpoint 2 for operator Bucket_write: 
default_database.irce_credit_info_mor_test -> Sink: clean_commits (1/1)
   Stack Trace:
   Copyat 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
   at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
   at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:681)
   at 
org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:650)
   at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
   at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:620)
   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
   at java.lang.Thread.run(Thread.java:748)
   Configuration Used:
   javaCopyoptions.put("index.type", "BUCKET");
   options.put("hoodie.bucket.index.num.buckets", "10");
   options.put("hoodie.index.bucket.engine", "SIMPLE");
   Observed Behavior:
   
   The job initiates and creates parquet files in the configured buckets
   During checkpoint (specifically checkpoint 2), the operation fails
   A rollback operation is triggered
   All previously written data becomes inaccessible
   After job restart, the generated parquet files are removed
   
   Expected Behavior:
   The Flink job should successfully write and commit data to COW table using 
bucket index, similar to how it works with MOR tables.
   Key Points:
   
   This issue only occurs with COW tables
   The same configuration works correctly with MOR tables
   The failure happens consistently during checkpoint operations
   
   
   Additional Information:
   
   No data loss is observed in MOR tables with identical configuration
   The issue appears to be specific to the interaction between COW tables and 
bucket index during checkpoint phase
   
   Questions:
   
   Is this a known limitation of COW tables with bucket index?
   Are there any workarounds or alternative configurations recommended for COW 
tables?
   Are there specific checkpoint configurations that might help resolve this 
issue?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to