i wanted to highlight here the issue we are facing with dynamic partition overwrite.
it seems that any tasks that writes to disk using this feature and that need to be retried fails upon retry, leading to a failure for the entire job. we have seen this issue show up with preemption (task gets killed by pre-emption, and when it gets rescheduled it fails consistently). it can also show up if a hardware issue causes your task to fail, or if you have speculative execution enabled. relevant jiras are SPARK-30320 and SPARK-29302 this affects spark 2.4.x and spark 3.0.0-SNAPSHOT writing to hive does not seem to be impacted. best, koert