Now, if we use saveAsNewAPIHadoopDataset with speculation enable.It may cause data loss. I check the comment of thi api:
We should make sure our tasks are idempotent when speculation is enabled, i.e. do * not use output committer that writes data directly. * There is an example in https://issues.apache.org/jira/browse/SPARK-10063 to show the bad * result of using direct output committer with speculation enabled. */ But if this the rule we must follow? For example,for parquet it will got ParquetOutPutCommitter. In this case, speculation must disable for parquet? Is there some one know the history? Thanks too much! -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org