pravin1406 opened a new issue, #13539:
URL: https://github.com/apache/hudi/issues/13539

   
   When keep ingestion updated records in hudi and deltacommits reach > 1000 we 
get 
    `Caused by: org.apache.hudi.exception.HoodieMetadataException: Metadata 
table's deltacommits exceeded 1000: this is likely caused by a pending instant 
in the data table. Resolve the pending instant or adjust 
`hoodie.metadata.max.deltacommits.when_pending`, then restart the pipeline.`
   
   If we increase, this limit to 2000, after 2000 delta commits it starts 
throwing this error again.. and after increasing same cycle keeps repeating
   This is due to a bug in checkNumDeltaCommits in 
HoodieBackedTableMetadataWriter. It looks for lastCompaction using  
`org.apache.hudi.common.table.timeline.HoodieTimeline.COMPACTION_ACTION` 
("compaction")  as filter...whereas completed compaction are stored as "commit"
   
   Sample from our Hudi table 
   
   -rw-rw-rw-   1 xxx xxx     510170 2025-07-09 21:00 
s3a://chnls-jarvis/datalake_staging/user_installed_apps/.hoodie/20250709205440267.commit
   -rw-rw-rw-   1 xxxx xxx          0 2025-07-09 20:55 
s3a://chnls-jarvis/datalake_staging/user_installed_apps/.hoodie/20250709205440267.compaction.inflight
   -rw-rw-rw-   1 xxx xxx     195649 2025-07-09 20:55 
s3a://chnls-jarvis/datalake_staging/user_installed_apps/.hoodie/20250709205440267.compaction.requested
    
   Hence it never detects compaction and after 1000 deltacommits it throws the 
mentioned error
   
   We have inline compaction and cleaner enabled, but the period/delay of this 
is large.
   
   
   * Hudi version : 0.14.1
   
   * Spark version : 3.3.2
   
   * Hive version : 3.3.1
   
   * Hadoop version : 3.1.3
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : Kubernetes
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to