[I] [SUPPORT] No way to clean archived/ folder [hudi]

via GitHub Wed, 27 Mar 2024 08:19:36 -0700


bk-mz opened a new issue, #10930:
URL: https://github.com/apache/hudi/issues/10930


   **Describe the problem you faced**
   
   There's no way to control `archived/` folder size and no way to trigger its 
cleaning.
   
   We have a long running table which accumulated a lot of archives (~100 GB) 
which now damages cleaner performance and overall performance of ingestion 
process.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Go to All Configuration in Hudi Site
   2. Check for all settings that control archived/ folder of hudi
   3. Ensure there is none
   
   **Expected behavior**
   
   There should be a description somewhere in documentation of stating how to 
upkeep `archived/` folder.
   
   Upkeep of archived/ folder should be delegated to cleaner.
   
   **Environment Description**
   
   * Hudi version : 0.14.0
   
   * Spark version : 3.4.1
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Related slack thread: 
https://apache-hudi.slack.com/archives/C4D716NPQ/p1711531654297129
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [SUPPORT] No way to clean archived/ folder [hudi]

Reply via email to