I observe that.
If commit Job done on driver and commit task done on executor.
With speculation enable,it may cause data loss.
Since commit Job will call listStatus and commit Task will delete output
file if already exist and rename to final output.
When listStatus called after delete and before rename, then data will loss!

Am i right!
Thanks Steve

2018-04-04 4:44 GMT+08:00 Steve Loughran <ste...@hortonworks.com>:

>
>
> > On 3 Apr 2018, at 11:19, cane <zhoukang199...@gmail.com> wrote:
> >
> > Now, if we use saveAsNewAPIHadoopDataset with speculation enable.It may
> cause
> > data loss.
> > I check the comment of thi api:
> >
> >  We should make sure our tasks are idempotent when speculation is
> enabled,
> > i.e. do
> >   * not use output committer that writes data directly.
> >   * There is an example in
> > https://issues.apache.org/jira/browse/SPARK-10063 to show the bad
> >   * result of using direct output committer with speculation enabled.
> >   */
> >
> > But if this the rule we must follow?
> > For example,for parquet it will got ParquetOutPutCommitter.
> > In this case, speculation must disable for parquet?
> >
> > Is there some one know the history?
> > Thanks too much!
>
>
> If you are writing to HDFS or object stores other than s3 and you make
> sure that you are using the FileOutputFormat commit algorithm, you can use
> speculation without problems.
>
> spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version 1
>
> if you use the version 2 algorithm then you are vulnerable to a failure
> during task commit, but only during task commit and then if
> speculative/repeated tasks generate output files with different names.
>
> spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version 2
>
> If you are using S3 as a direct destination of work, then, in the absence
> of a consistency layer (S3mer, EMR consistent s3, Hadoop 3,x + S3Guard) or
> an S3-Specific committer, you are always at risk of data loss. Don't dp that
>
> Further reading
>
> https://github.com/steveloughran/zero-rename-committer/releases/download/
> tag_draft_003/a_zero_rename_committer.pdf
>
>


-- 
祝好,
周康

Reply via email to