Support for Manifest file inside Distributed Caches (Archives)
--------------------------------------------------------------

                 Key: HADOOP-4511
                 URL: https://issues.apache.org/jira/browse/HADOOP-4511
             Project: Hadoop Core
          Issue Type: Improvement
    Affects Versions: 0.17.2
            Reporter: Martin Eckert
            Priority: Minor


I'm in a situation where I'm using the DistributedCache API to add a library 
package to my hadoop job. The library bundle consists of a JAR file, native 
library files and data files. At this point it is pretty cumbersome to set up 
the job properly so that the library can be used from within the map/reduce job.

The best way I could come up with was to keep the <lib>.jar file outside of the 
archive file and use the -libjars argument to point to the external JAR file. 
The archive is submitted using DistributedCache.setCacheArchives() and 
DistributedCache.createSymlink().
To add the library path (with the native library files), I append 
-Djava.library.path=./symlink/lib to the mapred.child.java.opts JobConf option. 
To reference the config file inside the archive the relative path (e.g. 
./symlink/conf/config.txt) is used.

It would be very helpful if these settings could largely be encapsulated inside 
the archive itself in form of a Manifest file. The manifest file could define 
the relative path to the jar file(s) and library path(s). Those would be 
automatically read and added to the jobs class and library paths.

The config file could be referenced and assigned a name inside the manifest so 
that in the code those would be available through the JobConf.get() method and 
used where needed.

There would be other opportunities that would come from this approach but 
mainly it would make deployment and distribution of archived packages for 
Hadoop much easier.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to