ankit0811 opened a new issue, #11466:
URL: https://github.com/apache/hudi/issues/11466

   We are trying to create a COW table using kafka as our source and s3 as our 
sink. The source comprises of a list of kafka topics.
   The current checkpoint happens every 2 mins and when the checkpoint starts 
(hudi table files commit), the flink job throws an exception
   
   ```
   2024-06-17 23:43:03
   org.apache.flink.util.FlinkException: Global failure triggered by 
OperatorCoordinator for 'hoodie_append_write: <database_name>.<table_name>' 
(operator fa7c267a06b83c5e2dc3af13367ebe76).
        at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:556)
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$start$0(StreamWriteOperatorCoordinator.java:196)
        at 
org.apache.hudi.sink.utils.NonThrownExecutor.handleException(NonThrownExecutor.java:142)
        at 
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:133)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
   Caused by: org.apache.hudi.exception.HoodieException: Executor executes 
action [commits the instant 20240618064120870] error
        ... 6 more
   Caused by: org.apache.hudi.exception.HoodieException: Commit instant 
[20240618064120870] failed and rolled back !
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.doCommit(StreamWriteOperatorCoordinator.java:587)
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.commitInstant(StreamWriteOperatorCoordinator.java:542)
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2(StreamWriteOperatorCoordinator.java:258)
        at 
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)
        ... 3 more
   
   ```
   
   This pipeline is the only one writing to the table so we dont have multiple 
writers.
   
   Below is the config used for this
   
   ```
   options.put(FlinkOptions.PATH.key(), hudiBasePath);
         options.put(FlinkOptions.TABLE_NAME.key(), targetTable);
         options.put(FlinkOptions.TABLE_TYPE.key(), 
HoodieTableType.COPY_ON_WRITE.name());
         options.put(FlinkOptions.PRECOMBINE_FIELD.key(), "timestamp"); // need 
to fix this as last_Update is null
         options.put(FlinkOptions.IGNORE_FAILED.key(), "false");
         options.put(HoodieIndexConfig.INDEX_TYPE.key(), 
HoodieIndex.IndexType.GLOBAL_BLOOM.name());
         options.put(FlinkOptions.OPERATION.key(), 
WriteOperationType.INSERT.value());
         options.put(FlinkOptions.PARTITION_PATH_FIELD.key(), "ts_date:string");
         options.put("hoodie.parquet.small.file.limit", "104857600");
         options.put("hoodie.parquet.max.file.size", "536870912");
         options.put("clustering.schedule.enabled", "true");
         options.put("clustering.async.enabled", "true");
         options.put("hoodie.clustering.plan.strategy.max.bytes.per.group", 
"107374182400");
         //options.put("hoodie.clustering.plan.strategy.max.num.groups", "1");
         options.put("write.tasks", "1");
   ```
   
   **Environment Description**
   
   * Hudi version : 0.14.1
   
   * Flink version : 1.15.2
   
   * Storage (HDFS/S3/GCS..) : s3
   
   
   Based on some GH history, we did try to delete the `.aux/ckp_meta/` dir, but 
still no luck.
   
   Any pointers on how we go about fixing this would be much appreciated.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to