Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "Hive/LanguageManual/Archiving" page has been changed by PaulYang. http://wiki.apache.org/hadoop/Hive/LanguageManual/Archiving?action=diff&rev1=5&rev2=6 -------------------------------------------------- Due to the design of HDFS, the number of files in the filesystem directly affect the memory consumption in the namenode. While normally not a problem for small clusters, memory usage may hit the limits of accessible memory on a single machine when there are >50-100 million files. In such situations, it is advantageous to have as few files as possible. - The use of [[http://hadoop.apache.org/mapreduce/docs/r0.21.0/hadoop_archives.html | Hadoop Archives]] is one approach to reducing the number of files in partitions. Hive has built-in support that allows users to easily move files in existing partitions to a Hadoop Archive (HAR) so that a partition that may once have consisted of 100's of files occupy ~3 files (depending on settings) However, the trade off is that queries may be slower due to the additional overhead in indirection. + The use of [[http://hadoop.apache.org/mapreduce/docs/r0.21.0/hadoop_archives.html | Hadoop Archives]] is one approach to reducing the number of files in partitions. Hive has built-in support to convert files in existing partitions to a Hadoop Archive (HAR) so that a partition that may once have consisted of 100's of files can occupy just ~3 files (depending on settings) However, the trade off is that queries may be slower due to the additional overhead in reading from the HAR. Note that archiving does NOT compress the files - HAR is analogous to the unix tar command.
