bvaradar commented on issue #2392: URL: https://github.com/apache/hudi/issues/2392#issuecomment-752345860
Please note that since S3 does not support append operations so Hudi is forced to create new archive files every time it tries to archive. You can increase hoodie.commits.archival.batch moving forward to increase the number of commits archived per archive file. In addition, you can increase the difference between the 2 watermark configurations : hoodie.keep.max.commits (default : 30) and hoodie.keep.min.commits (default : 20). This way, you can reduce the number of archive files created and also at the same time increase the number of metadata archived per archive file. Note that post 0.7.0 release where we are adding consolidated Hudi metadata (RFC-15), the follow up work would involve re-organizing archival metadata so that we can do periodic compactions to control file-sizing of these archive files. Added FAQ: https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-Iamseeinglotofarchivefiles.HowdoIcontrolthenumberofarchivecommitfilesgenerated? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
