nsivabalan commented on issue #2888: URL: https://github.com/apache/hudi/issues/2888#issuecomment-830716667
@PavelPetukhov : I see that you have given only 2 as parallelism. Can you try increasing to 50 may be. --hoodie-conf hoodie.upsert.shuffle.parallelism=50 --hoodie-conf hoodie.insert.shuffle.parallelism=50 --hoodie-conf hoodie.delete.shuffle.parallelism=50 --hoodie-conf hoodie.bulkinsert.shuffle.parallelism=50 Also, I see you are setting the operation as "BULK_INSERT". Wanted to clarify what this operation is about. This is intended to be used only for first time loading data into hudi. Otherwise, you are expected to use "insert" or "upsert". also, can you try using the latest release 0.8.0. We did has some fix relating to releasing memory for rdds after each batch of ingestion. might help fix the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
