vinothchandar commented on issue #1328: Hudi upsert hangs URL: https://github.com/apache/incubator-hudi/issues/1328#issuecomment-589152895 @lamber-ken is right.. I am looking into why the DiskBasedMap is so slow (there was a recent change.. wondering if its a regression.. ) Will raise a JIRA nonetheless.. So bit more explanation..The big difference is that all 4M entries go to one file and its a degenerate workload (i.e a single field record) where the metadata to data overhead is lot.. We have a spilling mechanism to handle large number keys merging into a single file (like the spill map you will see in spark shuffle) and that seems to be performing poorly..
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services