rohit-m-99 commented on issue #3821: URL: https://github.com/apache/hudi/issues/3821#issuecomment-946395852
I see, as of now the main problem is that intuitively we'd partition by each `run` but each `run` is only about 2000-4000k records, so it is not immediately obvious on what field we should be partitioning by. Any advice here would be appreciated. We chose to not partition by the `run` id for query performance (not to have too many partitions). But not sure about alternatives - our use case has pretty high variability between time periods so have moved away from time base partitioning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
