Feature to restart Spark job from previous failure point

Dipayan Dev Mon, 04 Sep 2023 21:56:31 -0700

Hi Team,

One of the biggest pain points we're facing is when Spark reads upstream
partition data and during Action, the upstream also gets refreshed and the
application fails with 'File not exists' error. It could happen that the
job has already spent a reasonable amount of time, and re-running the
entire application is unwanted.


I know the general solution to this is to handle how the upstream is
managing the data, but is there a way to tackle this problem from the Spark
applicable side? One approach I was thinking of is to at least save some
state of operations done by Spark job till that point, and on a retry,
resume the operation from that point?



With Best Regards,

Dipayan Dev

Feature to restart Spark job from previous failure point

Reply via email to