[GitHub] [hudi] rohit-m-99 commented on issue #3821: [SUPPORT] Ingestion taking very long time getting small files from partitions/

GitBox Mon, 18 Oct 2021 23:09:08 -0700


rohit-m-99 commented on issue #3821:
URL: https://github.com/apache/hudi/issues/3821#issuecomment-946395852



   I see, as of now the main problem is that intuitively we'd partition by each 
`run` but each `run` is only about 2000-4000k records, so it is not immediately 
obvious on what field we should be partitioning by. 
   
   Any advice here would be appreciated. We chose to not partition by the `run` 
id for query performance (not to have too many partitions). But not sure about 
alternatives - our use case has pretty high variability between time periods so 
have moved away from time base partitioning. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] rohit-m-99 commented on issue #3821: [SUPPORT] Ingestion taking very long time getting small files from partitions/

Reply via email to