[
https://issues.apache.org/jira/browse/SPARK-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-8578:
-----------------------------------
Assignee: Apache Spark (was: Yin Huai)
> Should ignore user defined output committer when appending data
> ---------------------------------------------------------------
>
> Key: SPARK-8578
> URL: https://issues.apache.org/jira/browse/SPARK-8578
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.4.0
> Reporter: Cheng Lian
> Assignee: Apache Spark
>
> When appending data to a file system via Hadoop API, it's safer to ignore
> user defined output committer classes like {{DirectParquetOutputCommitter}}.
> Because it's relatively hard to handle task failure in this case. For
> example, {{DirectParquetOutputCommitter}} directly writes to the output
> directory to boost write performance when working with S3. However, there's
> no general way to determine task output file path of a specific task in
> Hadoop API, thus we don't know to revert a failed append job. (When doing
> overwrite, we can just remove the whole output directory.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]