Hey Jelez,

The recommended way to handle this is to make your tasks idempotent. T2
should overwrite the S3 file, not fail if it already exists.

Cheers,
Chris

On Sun, May 15, 2016 at 11:42 AM, Raditchkov, Jelez (ETW) <
[email protected]> wrote:

> I am running several dependent tasks:
> T1 - delete S3 folder for
> T2 - scoop from DB to the S3 folder
>
> Problem if T2 fails in the middle every retry then gets: Encountered
> IOException running import job:
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> s3://...
>
> Is there a way reattempt a group of tasks not only the T2 - the way it is
> now the DAG fails because of S3 folder exists when it was created by the
> failed T2 attempt and the DAG can never succeed.
>
> Any suggestions?
>
> Thanks!
>
>

Reply via email to