Rob Leidle created HIVE-13321:
---------------------------------

             Summary: Add support for different output strategies
                 Key: HIVE-13321
                 URL: https://issues.apache.org/jira/browse/HIVE-13321
             Project: Hive
          Issue Type: Improvement
            Reporter: Rob Leidle


The Hadoop ecosystem has expanded to support a wider variety of data-stores and 
filesystems than simply HDFS. These FileSystems have different write atomicity 
and read consistency guarantees.  There are enhancements we can make to Hive to 
ensure Hive works even better with a wider variety of FileSystems in the Hadoop 
ecosystem. We can see work going on in the Hadoop project to robustly support 
these FileSystems. One such example is HADOOP-9565 where the behavior of 
MapReduce output is enhanced to do what is optimal for different FileSystems.
 
A common pattern in MapReduce and Hive is to write all output into a temporary 
folder and then rename this temporary folder to match the final output 
location. When using some of the newer FileSystems with Hive, the performance 
can be improved by directly writing output and avoiding the temporary folder 
write & rename.
 
The proposal is to enhance Hive to support different strategies for file 
output. One such strategy would be a concept named “DirectWrite”. DirectWrite 
will be optionally enabled, likely on a per-FileSystem basis. When DirectWrite 
is enabled, all Hive job output will be written directly to the output location.
 
This is an umbrella JIRA for all the tasks related to this functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to