Hi Team, One of the biggest pain points we're facing is when Spark reads upstream partition data and during Action, the upstream also gets refreshed and the application fails with 'File not exists' error. It could happen that the job has already spent a reasonable amount of time, and re-running the entire application is unwanted.
I know the general solution to this is to handle how the upstream is managing the data, but is there a way to tackle this problem from the Spark applicable side? One approach I was thinking of is to at least save some state of operations done by Spark job till that point, and on a retry, resume the operation from that point? With Best Regards, Dipayan Dev