saveAsNewAPIHadoopDataset must not enable speculation for parquet file?

cane Tue, 03 Apr 2018 03:20:02 -0700

Now, if we use saveAsNewAPIHadoopDataset with speculation enable.It may cause
data loss.
I check the comment of thi api:


  We should make sure our tasks are idempotent when speculation is enabled,
i.e. do
   * not use output committer that writes data directly.
   * There is an example in
https://issues.apache.org/jira/browse/SPARK-10063 to show the bad
   * result of using direct output committer with speculation enabled.
   */

But if this the rule we must follow?
For example,for parquet it will got ParquetOutPutCommitter.
In this case, speculation must disable for parquet?

Is there some one know the history?
Thanks too much!




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

saveAsNewAPIHadoopDataset must not enable speculation for parquet file?

Reply via email to