Sahil Takiar created HIVE-16295:
-----------------------------------
Summary: Add support for using Hadoop's OutputCommitter
Key: HIVE-16295
URL: https://issues.apache.org/jira/browse/HIVE-16295
Project: Hive
Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar
Hive doesn't have integration with Hadoop's {{OutputCommitter}}, it uses a
{{NullOutputCommitter}} and uses its own commit logic spread across
{{FileSinkOperator}}, {{MoveTask}}, and {{Hive}}.
The Hadoop community is building a {{OutputCommitter}} that integrates with
S3Guard and does a safe, coordinate commit of data on S3 inside individual
tasks. If Hive can integrate with this new {{OutputCommitter}} there would be a
lot of benefits to Hive-on-S3:
* Data is only written once; directly committing data at a task level means no
renames are necessary
* The commit is done safely, in a coordinated manner; duplicate tasks (from
task retries or speculative execution) should not step on each other
* Data is written within each task, so everything in does in parallel
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)