nsivabalan commented on issue #4873:
URL: https://github.com/apache/hudi/issues/4873#issuecomment-1301568436

   btw, not sure if I have called this out before. I see you are partitioning 
by hour. this would result in very high cardinality wrt num of partitions > 25k 
for few years of data. Generally its advisable to keep the total number of 
partitions 10k or less. If not, we have to spend lot of time doing the perf 
tuning. Alternatively you can employ clustering to cluster your data based on 
hour and reap the similar benefits based on col stats pruning w/ metadata 
table. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to