[ 
https://issues.apache.org/jira/browse/HIVE-16295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448922#comment-16448922
 ] 

Sahil Takiar commented on HIVE-16295:
-------------------------------------

Attaching initial patch to get a run of Hive QA in. Here is an update of my 
progress so far:

* Attached a WIP prototype type that works for several basic use cases: CTAS, 
INSERT INTO, etc.
* Most of the {{hive-blobstore}} tests are passing when run against S3A; there 
are a bunch of explain diffs because a new stage is introduced, but the query 
outputs are the same
* Dynamic partitioning doesn't work yet
* Haven't really investigated bucketed tables yet, but the tests are passing 
locally
* There are a number of hacks that need to be cleaned up, but I think the 
overall design is mostly in place

At a high level the design is as follows:
* Introduce a new task that is run before a {{SparkTask}} or {{MapReduceTask}}, 
this {{Task}} will create a specified {{PathOutputCommitter}} and run the 
{{setupJob}} method
* The {{MoveTask}} will do the same thing, but run {{commitJob}}
* Bunch of other changes to {{MoveTask}} so that it doesn't run any of the 
{{fs}} operations to commit any data
* The {{S3ACommitOptimization}} is a new physical optimization that does some 
setup work the {{FileSinkOperator}} has access to the final output path; it 
also handles a number of other setup tasks to make sure the 
{{FileSinkOperator}} uses the specified {{PathOutputCommitter}} and sets the 
working and output paths correctly

The main caveat is that I don't think this will work when the Hive 
Merge-Small-Files job is triggered. The reason is that this job implicitly 
depends on the fact that renames are atomic operations, which is not the case 
on S3. Right now, I've disabled the job by default, but need to come up with a 
cleaner solution. Probably will need to short-circuit the optimizations if the 
Merge-Small-Files job is enabled. The only place it is turned on by default is 
in when files are written by a Map-only MR job, but we should be able to detect 
that scenario and auto disable the committer optimizations.

> Add support for using Hadoop's S3A OutputCommitter
> --------------------------------------------------
>
>                 Key: HIVE-16295
>                 URL: https://issues.apache.org/jira/browse/HIVE-16295
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>         Attachments: HIVE-16295.1.WIP.patch
>
>
> Hive doesn't have integration with Hadoop's {{OutputCommitter}}, it uses a 
> {{NullOutputCommitter}} and uses its own commit logic spread across 
> {{FileSinkOperator}}, {{MoveTask}}, and {{Hive}}.
> The Hadoop community is building an {{OutputCommitter}} that integrates with 
> S3Guard and does a safe, coordinate commit of data on S3 inside individual 
> tasks (HADOOP-13786). If Hive can integrate with this new {{OutputCommitter}} 
> there would be a lot of benefits to Hive-on-S3:
> * Data is only written once; directly committing data at a task level means 
> no renames are necessary
> * The commit is done safely, in a coordinated manner; duplicate tasks (from 
> task retries or speculative execution) should not step on each other



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to