bvaradar commented on issue #2392:
URL: https://github.com/apache/hudi/issues/2392#issuecomment-752345860


   Please note that since S3 does not support append operations so Hudi is 
forced to create new archive files every time it tries to archive. You can 
increase hoodie.commits.archival.batch moving forward to increase the number of 
commits archived per archive file. In addition, you can increase the difference 
between the 2 watermark configurations : hoodie.keep.max.commits (default : 30) 
and hoodie.keep.min.commits (default : 20). This way, you can reduce the number 
of archive files created and also at the same time increase the number of 
metadata archived per archive file. Note that post 0.7.0 release where we are 
adding consolidated Hudi metadata (RFC-15), the follow up work would involve 
re-organizing archival metadata so that we can do periodic compactions to 
control file-sizing of these archive files.
   
   Added FAQ: 
https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-Iamseeinglotofarchivefiles.HowdoIcontrolthenumberofarchivecommitfilesgenerated?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to