Rap70r commented on issue #2586:
URL: https://github.com/apache/hudi/issues/2586#issuecomment-789169448


   Hi @vinothchandar,
   
   Thank you for your detailed answer.
   Yes, we are going to increase the retention policy to a higher number, like 
15 or maybe higher, and also will work into improving performance of readers. 
We wouldn't want to have a retention period that exceeds few hours for the 
exact reason you mentioned.
   
   I did try to increase the number of partitions to few thousands but after a 
certain point the performance drops due to the time it takes to iterate over 
all the files on our cluster's setup.
   
   I want to clarify that we are not using Hive in our setup. Hudi tables are 
all written to S3 directly by Spark.
   
   Thank you


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to