spark lacks fault tolerance with dynamic partition overwrite

Koert Kuipers Thu, 02 Apr 2020 20:07:16 -0700

i wanted to highlight here the issue we are facing with dynamic partition
overwrite.


it seems that any tasks that writes to disk using this feature and that
need to be retried fails upon retry, leading to a failure for the entire
job.

we have seen this issue show up with preemption (task gets killed by
pre-emption, and when it gets rescheduled it fails consistently). it can
also show up if a hardware issue causes your task to fail, or if you have
speculative execution enabled.

relevant jiras are SPARK-30320 and SPARK-29302

this affects spark 2.4.x and spark 3.0.0-SNAPSHOT
writing to hive does not seem to be impacted.

best,
koert

spark lacks fault tolerance with dynamic partition overwrite

Reply via email to