BruceKellan opened a new issue, #11709:
URL: https://github.com/apache/hudi/issues/11709

   **Describe the problem you faced**
   
   In our production environment, there are hundreds of flink hudi streaming 
application running related to hudi. Due to the characteristics of streaming, 
the application will generate checkpoints at the minute level, such as a commit 
every 3 minutes.
   
   Now I have some applications that have been running for a year. Even if the 
archived metadata is merged and archived, the file directory size of 
/.hoodie/archived is quite large and close to 20GB.
   
   In order to avoid unnecessary risks, I need to take some measures in 
advance. I have some doubts:
   
   1. What will happen if I directly delete some archived files that have not 
been changed for a long time?
   
   2. Is there any way for me to clean up these archived files?
   
   <img width="1085" alt="image" 
src="https://github.com/user-attachments/assets/880ed9f8-62ff-4173-8491-0caafdd15944";>
   
   **Environment Description**
   
   * Hudi version : 0.13.1
   
   * Spark version : 3.2.0
   
   * Storage (HDFS/S3/GCS..) : Aliyun-OSS
   
   * Running on Docker? (yes/no) : no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to