Rap70r commented on issue #2586: URL: https://github.com/apache/hudi/issues/2586#issuecomment-789169448
Hi @vinothchandar, Thank you for your detailed answer. Yes, we are going to increase the retention policy to a higher number, like 15 or maybe higher, and also will work into improving performance of readers. We wouldn't want to have a retention period that exceeds few hours for the exact reason you mentioned. I did try to increase the number of partitions to few thousands but after a certain point the performance drops due to the time it takes to iterate over all the files on our cluster's setup. I want to clarify that we are not using Hive in our setup. Hudi tables are all written to S3 directly by Spark. Thank you ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
