Add support for transitive native libraries to DistributedCache
----------------------------------------------------------------
Key: HADOOP-3298
URL: https://issues.apache.org/jira/browse/HADOOP-3298
Project: Hadoop Core
Issue Type: Improvement
Components: mapred
Environment: unix (different handling would be required for windows)
Reporter: Subramaniam Krishnan
Assignee: Arun C Murthy
Fix For: 0.16.0
Currently if a M/R job depends on JNI based component the dynamic library must
be available in all the task nodes. This is not possible specially when you
have not control on the cluster machines, just using it as a service.
It should be possible to specify using the DistributedCache what are the native
libraries a job needs.
For example via a new method 'public void addLibrary(Path libraryPath, JobConf
conf)'.
The added libraries would make it to the local FS of the task nodes (same way
as cached resources) but instead been part of the classpath they would be
copied to a lib directory and that lib directory would be added t the
LD_LIBRARY_PATH of the task JVM.
An alternative would be to set the '-Djava.library.path=' task JVM parameter to
the lib directory above. However, this would break for libraries that depend on
other libraries as the dependent one would not be in the LD_LIBRARY_PATH and
the OS would fail to find it as it is not the JVM the one doing the load of the
dependent one.
For uncached usage of native libraries, a special directory in the JAR could be
used for native libraries. But I'd argue that the DistributedCache enhancement
would be enough, and if somebody wants to use a native library s/he should use
the DistributedCached. Or a JobConf addLibrary method that uses the
DistributedCached under the hood at submission time.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.