Rob Leidle created HIVE-13321:
---------------------------------
Summary: Add support for different output strategies
Key: HIVE-13321
URL: https://issues.apache.org/jira/browse/HIVE-13321
Project: Hive
Issue Type: Improvement
Reporter: Rob Leidle
The Hadoop ecosystem has expanded to support a wider variety of data-stores and
filesystems than simply HDFS. These FileSystems have different write atomicity
and read consistency guarantees. There are enhancements we can make to Hive to
ensure Hive works even better with a wider variety of FileSystems in the Hadoop
ecosystem. We can see work going on in the Hadoop project to robustly support
these FileSystems. One such example is HADOOP-9565 where the behavior of
MapReduce output is enhanced to do what is optimal for different FileSystems.
A common pattern in MapReduce and Hive is to write all output into a temporary
folder and then rename this temporary folder to match the final output
location. When using some of the newer FileSystems with Hive, the performance
can be improved by directly writing output and avoiding the temporary folder
write & rename.
The proposal is to enhance Hive to support different strategies for file
output. One such strategy would be a concept named “DirectWrite”. DirectWrite
will be optionally enabled, likely on a per-FileSystem basis. When DirectWrite
is enabled, all Hive job output will be written directly to the output location.
This is an umbrella JIRA for all the tasks related to this functionality.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)