pravin1406 opened a new issue, #13539:
URL: https://github.com/apache/hudi/issues/13539
When keep ingestion updated records in hudi and deltacommits reach > 1000 we
get
`Caused by: org.apache.hudi.exception.HoodieMetadataException: Metadata
table's deltacommits exceeded 1000: this is likely caused by a pending instant
in the data table. Resolve the pending instant or adjust
`hoodie.metadata.max.deltacommits.when_pending`, then restart the pipeline.`
If we increase, this limit to 2000, after 2000 delta commits it starts
throwing this error again.. and after increasing same cycle keeps repeating
This is due to a bug in checkNumDeltaCommits in
HoodieBackedTableMetadataWriter. It looks for lastCompaction using
`org.apache.hudi.common.table.timeline.HoodieTimeline.COMPACTION_ACTION`
("compaction") as filter...whereas completed compaction are stored as "commit"
Sample from our Hudi table
-rw-rw-rw- 1 xxx xxx 510170 2025-07-09 21:00
s3a://chnls-jarvis/datalake_staging/user_installed_apps/.hoodie/20250709205440267.commit
-rw-rw-rw- 1 xxxx xxx 0 2025-07-09 20:55
s3a://chnls-jarvis/datalake_staging/user_installed_apps/.hoodie/20250709205440267.compaction.inflight
-rw-rw-rw- 1 xxx xxx 195649 2025-07-09 20:55
s3a://chnls-jarvis/datalake_staging/user_installed_apps/.hoodie/20250709205440267.compaction.requested
Hence it never detects compaction and after 1000 deltacommits it throws the
mentioned error
We have inline compaction and cleaner enabled, but the period/delay of this
is large.
* Hudi version : 0.14.1
* Spark version : 3.3.2
* Hive version : 3.3.1
* Hadoop version : 3.1.3
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : Kubernetes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]