[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

Lefty Leverenz (JIRA) Tue, 07 Mar 2017 03:05:57 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899249#comment-15899249
 ]


Lefty Leverenz commented on HIVE-15121:
---------------------------------------

Sergio Peña documented *hive.blobstore.optimizations.enabled* in a new 
Blobstore section of Hive Configuration Properties:

* [Configuration Properties -- Blobstore (i.e. Amazon S3) | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Blobstore(i.e.AmazonS3)]
* [Configuration Properties -- hive.blobstore.optimizations.enabled | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.blobstore.optimizations.enabled]

Removed the TODOC2.2 label.

> Last MR job in Hive should be able to write to a different scratch directory
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-15121
>                 URL: https://issues.apache.org/jira/browse/HIVE-15121
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.patch, HIVE-15121.WIP.1.patch, 
> HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

Reply via email to