n3nash edited a comment on issue #2946: URL: https://github.com/apache/hudi/issues/2946#issuecomment-841985843
@AkshayChan From the message it seems to be pretty clear that some of the nodes are running out of disk space ``` Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 77.0 failed 4 times, most recent failure: Lost task 0.3 in stage 77.0 (TID 22943, 172.34.88.19, executor 32): com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on device ``` I would recommend checking the EMR instances you are provisioning and logging into the boxes when the job is running to see when it's running out of space. To give you an idea of how this can happen, whenever Hudi performs an upsert, it will shuffle some data around. Spark shuffle has 2 phases : map and reduce. The map phase spills data to the local disk and uses the KryoSerializer to do so. That's is where you are running into this exception. Not much I can do here. Let me know if you need anything. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
