Support for Manifest file inside Distributed Caches (Archives)
--------------------------------------------------------------
Key: HADOOP-4511
URL: https://issues.apache.org/jira/browse/HADOOP-4511
Project: Hadoop Core
Issue Type: Improvement
Affects Versions: 0.17.2
Reporter: Martin Eckert
Priority: Minor
I'm in a situation where I'm using the DistributedCache API to add a library
package to my hadoop job. The library bundle consists of a JAR file, native
library files and data files. At this point it is pretty cumbersome to set up
the job properly so that the library can be used from within the map/reduce job.
The best way I could come up with was to keep the <lib>.jar file outside of the
archive file and use the -libjars argument to point to the external JAR file.
The archive is submitted using DistributedCache.setCacheArchives() and
DistributedCache.createSymlink().
To add the library path (with the native library files), I append
-Djava.library.path=./symlink/lib to the mapred.child.java.opts JobConf option.
To reference the config file inside the archive the relative path (e.g.
./symlink/conf/config.txt) is used.
It would be very helpful if these settings could largely be encapsulated inside
the archive itself in form of a Manifest file. The manifest file could define
the relative path to the jar file(s) and library path(s). Those would be
automatically read and added to the jobs class and library paths.
The config file could be referenced and assigned a name inside the manifest so
that in the code those would be available through the JobConf.get() method and
used where needed.
There would be other opportunities that would come from this approach but
mainly it would make deployment and distribution of archived packages for
Hadoop much easier.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.