n3nash commented on issue #2806: URL: https://github.com/apache/hudi/issues/2806#issuecomment-820155041
@tmac2100 Thanks for those details, they are very helpful to understand your job. I need one more information before I can understand the root cause, can you please put a screenshot of the "stages" tab of the spark application that took 16 hours ? I want to understand which stages took this much time, my hunch is this should be the spark stages related to BloomIndex in which case either a) Your bloom index is not configured correctly leading to lots of false positives and hence longer stage times b) You 20% updates are spanning 100s of partitions leading to a lot of bloom index lookups.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
