ankit0811 opened a new issue, #11466:
URL: https://github.com/apache/hudi/issues/11466
We are trying to create a COW table using kafka as our source and s3 as our
sink. The source comprises of a list of kafka topics.
The current checkpoint happens every 2 mins and when the checkpoint starts
(hudi table files commit), the flink job throws an exception
```
2024-06-17 23:43:03
org.apache.flink.util.FlinkException: Global failure triggered by
OperatorCoordinator for 'hoodie_append_write: <database_name>.<table_name>'
(operator fa7c267a06b83c5e2dc3af13367ebe76).
at
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:556)
at
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$start$0(StreamWriteOperatorCoordinator.java:196)
at
org.apache.hudi.sink.utils.NonThrownExecutor.handleException(NonThrownExecutor.java:142)
at
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:133)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.hudi.exception.HoodieException: Executor executes
action [commits the instant 20240618064120870] error
... 6 more
Caused by: org.apache.hudi.exception.HoodieException: Commit instant
[20240618064120870] failed and rolled back !
at
org.apache.hudi.sink.StreamWriteOperatorCoordinator.doCommit(StreamWriteOperatorCoordinator.java:587)
at
org.apache.hudi.sink.StreamWriteOperatorCoordinator.commitInstant(StreamWriteOperatorCoordinator.java:542)
at
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$notifyCheckpointComplete$2(StreamWriteOperatorCoordinator.java:258)
at
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)
... 3 more
```
This pipeline is the only one writing to the table so we dont have multiple
writers.
Below is the config used for this
```
options.put(FlinkOptions.PATH.key(), hudiBasePath);
options.put(FlinkOptions.TABLE_NAME.key(), targetTable);
options.put(FlinkOptions.TABLE_TYPE.key(),
HoodieTableType.COPY_ON_WRITE.name());
options.put(FlinkOptions.PRECOMBINE_FIELD.key(), "timestamp"); // need
to fix this as last_Update is null
options.put(FlinkOptions.IGNORE_FAILED.key(), "false");
options.put(HoodieIndexConfig.INDEX_TYPE.key(),
HoodieIndex.IndexType.GLOBAL_BLOOM.name());
options.put(FlinkOptions.OPERATION.key(),
WriteOperationType.INSERT.value());
options.put(FlinkOptions.PARTITION_PATH_FIELD.key(), "ts_date:string");
options.put("hoodie.parquet.small.file.limit", "104857600");
options.put("hoodie.parquet.max.file.size", "536870912");
options.put("clustering.schedule.enabled", "true");
options.put("clustering.async.enabled", "true");
options.put("hoodie.clustering.plan.strategy.max.bytes.per.group",
"107374182400");
//options.put("hoodie.clustering.plan.strategy.max.num.groups", "1");
options.put("write.tasks", "1");
```
**Environment Description**
* Hudi version : 0.14.1
* Flink version : 1.15.2
* Storage (HDFS/S3/GCS..) : s3
Based on some GH history, we did try to delete the `.aux/ckp_meta/` dir, but
still no luck.
Any pointers on how we go about fixing this would be much appreciated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]