Archiving partitions
--------------------
Key: HIVE-1332
URL: https://issues.apache.org/jira/browse/HIVE-1332
Project: Hadoop Hive
Issue Type: New Feature
Components: Metastore
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
Partitions and tables in Hive typically consist of many files on HDFS. An issue
is that as the number of files increase, there will be higher memory/load
requirements on the namenode. Partitions in bucketed tables are a particular
problem because they consist of many files, one for each of the buckets.
One way to drastically reduce the number of files is to use hadoop archives:
http://hadoop.apache.org/common/docs/current/hadoop_archives.html
This feature would introduce an ALTER TABLE <table_name> ARCHIVE PARTITION
<spec> that would automatically put the files for the partition into a HAR
file. We would also have an UNARCHIVE option to convert the files in the
partition back to the original files. Archived partitions would be slower to
access, but they would have the same functionality and decrease the number of
files drastically. Typically, only seldom accessed partitions would be archived.
Hadoop archives are still somewhat new, so we'll only put in support for the
latest released major version (0.20). Here are some bug fixes:
https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could
potentially cause data loss without this fix)
https://issues.apache.org/jira/browse/HADOOP-6645
https://issues.apache.org/jira/browse/MAPREDUCE-1585
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.