vinothchandar commented on issue #1328: Hudi upsert hangs
URL: https://github.com/apache/incubator-hudi/issues/1328#issuecomment-589152895
 
 
   @lamber-ken is right.. I am looking into why the DiskBasedMap is so slow 
(there was a recent change.. wondering if its a regression.. ) Will raise a 
JIRA nonetheless..  
   
   
   So bit more explanation..The big difference is that all 4M entries go to one 
file and its a degenerate workload (i.e a single field record) where the 
metadata to data overhead is lot.. We have a spilling mechanism to handle large 
number keys merging into a single file (like the spill map you will see in 
spark shuffle) and that seems to be performing poorly.. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to