nsivabalan commented on issue #2586: URL: https://github.com/apache/hudi/issues/2586#issuecomment-785510794
Few options/questions: - does your incremental ingestion contains updates or inserts? If they are just inserts but Hudi's file sizing optimization joins w/ existing files, we can try turning off the file sizing. - In general, you can set the file versions retained based on max time for read query and max ingestions that could happen within that time frame. For eg, if you read query could take a max of 2 hours and you ingest once every 10 mins to hudi, you can set the file versions retained to 12. - Also, another option is to try MERGE_ON_READ table. here, hudi will just do delta commits which may not incur much write amplification as compared to COW. You can set file versions retained to 3 itself. delta commits doesn't come in the way of min file versions retained. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
