[
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lefty Leverenz updated HIVE-15121:
----------------------------------
Labels: (was: TODOC2.2)
> Last MR job in Hive should be able to write to a different scratch directory
> ----------------------------------------------------------------------------
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
> Issue Type: Sub-task
> Components: Hive
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Fix For: 2.2.0
>
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch,
> HIVE-15121.3.patch, HIVE-15121.patch, HIVE-15121.WIP.1.patch,
> HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS,
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the
> final job writes to S3. Writing to HDFS should be faster than writing to S3,
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the
> scratch directory to the final table directory can be done server-side,
> within the blobstore. The MoveTask simply renames data from the scratch
> directory to the final table location, which should translate to a
> server-side COPY request. This way HiveServer2 doesn't have to actually copy
> any data, it just tells the blobstore to do all the work.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)