tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610880742 > .option("hoodie.write.buffer.limit.bytes", "131072") //128MB I have tried but it doesn't help me (local[3] with driver memory 8g). Next, I have split my data into a single file with 5M records and try again, but it also doesn't help me - upsert failed with GC limit exception. Also, tried to decrease parquet file size, memory merge fraction. So, I have attached log and data with 5M records: - CSV data with 5M records https://drive.google.com/open?id=1uwJ68_RrKMUTbEtsGl56_P5b_mNX3k2S - Log with GC limit exception https://drive.google.com/open?id=147Qz7Iau1RWyRlWYSq8WvtWFB0fAXLM8 @lamber-ken, can you try to do bulk insert and then upsert on those data and configuration? Part of code and configuration you may take above from my reply.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
