nsivabalan commented on issue #7910: URL: https://github.com/apache/hudi/issues/7910#issuecomment-1454046187
Is its a COW or MOR table? COW: if you look at S3 directly, you might find older files too. Hudi after rewriting to a newer version of the base file, will not delete the older file immediately. Cleaner will take care of it. But your queries/reader will only read the latest version of the data file. But if you w/ MOR table, its more nuanced. By default only one file group (w/o any log files) are considered for small file bin packing. If you wish more files to be picked up, you can try tweaking https://hudi.apache.org/docs/configurations/#hoodiemergesmallfilegroupcandidateslimit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
