[
https://issues.apache.org/jira/browse/SPARK-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Lian updated SPARK-8578:
------------------------------
Target Version/s: 1.4.1, 1.5.0 (was: 1.5.0)
> Should ignore user defined output committer when appending data
> ---------------------------------------------------------------
>
> Key: SPARK-8578
> URL: https://issues.apache.org/jira/browse/SPARK-8578
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.4.0
> Reporter: Cheng Lian
> Assignee: Yin Huai
>
> When appending data to a file system via Hadoop API, it's safer to ignore
> user defined output committer classes like {{DirectParquetOutputCommitter}}.
> Because it's relatively hard to handle task failure in this case. For
> example, {{DirectParquetOutputCommitter}} directly writes to the output
> directory to boost write performance when working with S3. However, there's
> no general way to determine task output file path of a specific task in
> Hadoop API, thus we don't know to revert a failed append job. (When doing
> overwrite, we can just remove the whole output directory.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]