Attila Sasvari created OOZIE-2821:
-------------------------------------
Summary: Using Hadoop Archives for Oozie ShareLib
Key: OOZIE-2821
URL: https://issues.apache.org/jira/browse/OOZIE-2821
Project: Oozie
Issue Type: New Feature
Reporter: Attila Sasvari
Oozie ShareLib is a collection of lots of jar files that are required by Oozie
actions. Right now, these jars are uploaded one by one with Oozie ShareLib
installation. There can more hundreds of such jars, and many of them are pretty
small, significantly smaller than a HDFS block size. Storing a large number of
small files in HDFS is inefficient (for example due to the fact that there is
an object maintained for each file in the NameNode's memory and blocks
containing the small files might be much bigger then the actual files). When an
action is executed, these jar files are copied to the distributed cache.
It would worth to investigate the possibility of using [Hadoop
archives|http://hadoop.apache.org/docs/r2.6.5/hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopArchives.html]
for handling Oozie ShareLib files, because it could result in better
utilisation of HDFS.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)