xiearthur commented on issue #12523:
URL: https://github.com/apache/hudi/issues/12523#issuecomment-2558824552
Title: Flink-Hudi COW Table Write Fails with Bucket Index While MOR Works
Fine
Description:
We encountered an issue with Flink-Hudi when writing to COW tables using
bucket index. The write operation fails during checkpoint while the same
configuration works perfectly with MOR tables.
Issue Details:
The Flink job generates parquet files in buckets but fails to commit them
Files under .hoodie directory show rollback operations
Unable to read the data written by Flink
Generated parquet files disappear after job restart
Error Log:
CopyIOException: Could not perform checkpoint 2 for operator Bucket_write:
default_database.irce_credit_info_mor_test -> Sink: clean_commits (1/1)
Stack Trace:
Copyat
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:681)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:650)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:620)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
at java.lang.Thread.run(Thread.java:748)
Configuration Used:
javaCopyoptions.put("index.type", "BUCKET");
options.put("hoodie.bucket.index.num.buckets", "10");
options.put("hoodie.index.bucket.engine", "SIMPLE");
Observed Behavior:
The job initiates and creates parquet files in the configured buckets
During checkpoint (specifically checkpoint 2), the operation fails
A rollback operation is triggered
All previously written data becomes inaccessible
After job restart, the generated parquet files are removed
Expected Behavior:
The Flink job should successfully write and commit data to COW table using
bucket index, similar to how it works with MOR tables.
Key Points:
This issue only occurs with COW tables
The same configuration works correctly with MOR tables
The failure happens consistently during checkpoint operations
Additional Information:
No data loss is observed in MOR tables with identical configuration
The issue appears to be specific to the interaction between COW tables and
bucket index during checkpoint phase
Questions:
Is this a known limitation of COW tables with bucket index?
Are there any workarounds or alternative configurations recommended for COW
tables?
Are there specific checkpoint configurations that might help resolve this
issue?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]