n3nash commented on issue #2806:
URL: https://github.com/apache/hudi/issues/2806#issuecomment-820155041


   @tmac2100 Thanks for those details, they are very helpful to understand your 
job. I need one more information before I can understand the root cause, can 
you please put a screenshot of the "stages" tab of the spark application that 
took 16 hours ? I want to understand which stages took this much time, my hunch 
is this should be the spark stages related to BloomIndex in which case either 
a) Your bloom index is not configured correctly leading to lots of false 
positives and hence longer stage times b) You 20% updates are spanning 100s of 
partitions leading to a lot of bloom index lookups..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to