[GitHub] [hudi] nsivabalan commented on issue #4873: Processing time very Slow Updating records into Hudi Dataset(MOR) using AWS Glue

GitBox Wed, 02 Nov 2022 19:15:24 -0700


nsivabalan commented on issue #4873:
URL: https://github.com/apache/hudi/issues/4873#issuecomment-1301568436


   btw, not sure if I have called this out before. I see you are partitioning 
by hour. this would result in very high cardinality wrt num of partitions > 25k 
for few years of data. Generally its advisable to keep the total number of 
partitions 10k or less. If not, we have to spend lot of time doing the perf 
tuning. Alternatively you can employ clustering to cluster your data based on 
hour and reap the similar benefits based on col stats pruning w/ metadata 
table. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on issue #4873: Processing time very Slow Updating records into Hudi Dataset(MOR) using AWS Glue

Reply via email to