tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during 
upsert 53M records
URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610880742
 
 
   > .option("hoodie.write.buffer.limit.bytes", "131072")  //128MB
   
   I have tried but it doesn't help me. Tested on local mode with 3 threads and 
driver memory 8GB.
   
   Next, I have split my data into a single file with 5M records and try again, 
but it also doesn't help me - upsert failed with GC limit exception.
   
   Also, tried to decrease parquet file size, memory merge fraction, buffer 
limit bytes.
   
   So, I have attached log and data with 5M records:
   - CSV data with 5M records 
https://drive.google.com/open?id=1uwJ68_RrKMUTbEtsGl56_P5b_mNX3k2S
   - Log with GC limit exception 
https://drive.google.com/open?id=147Qz7Iau1RWyRlWYSq8WvtWFB0fAXLM8
   
   @lamber-ken, can you try to do bulk insert and then upsert on those data and 
configuration?
   Part of code and configuration you may take above from my reply.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to