[ https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654877#action_12654877 ]
Ahad Rana commented on HADOOP-4577: ----------------------------------- Hi Arun, Sorry, I have been tied up with other stuff and have not been able to bring this issue to closure. Will try to submit a patch based on previous suggestions shortly. As far as Distributed Cache vs. distributing via the Jar: In my experience, both methods have their valid uses. In our use case at CommonCrawl, we deploy a single jar that contains all of our map-reduce jobs to a master server, which then executes specific jobs on demand. We have various utility classes that are wrappers around native C/C++ libraries. We build these JNI wrappers and JNI libraries via the same build script that builds the jar. It is super convenient to be able to include and deploy the related JNI libraries within the jar (and thus have them available at each mapper/reducer node). This way, all of our various jobs can use these classes seamlessly without relying on any special JOB SPECIFIC setup (such as adding the appropriate JNI libraries to the Distributed Cache). So, in conclusion, Distributed Cache is good for cases where library availability is determined by job config, mapred.child.java.opts is convenient for scenarios where a set of (relatively static) libraries are part of the standard cluster config, and the third method I am proposing, deployment via jar, is convenient for scenarios where a deployment jar contains more than one job, and library availability is desired across all jobs. Sound right ? Ahad. > Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI > libraries to be deployed via JAR file > ----------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-4577 > URL: https://issues.apache.org/jira/browse/HADOOP-4577 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Affects Versions: 0.18.1 > Environment: Hadoop 18.1 Cluster with custom JNI shared libraries > deployed in lib directory of deployment JAR. > Reporter: Ahad Rana > Assignee: Ahad Rana > Fix For: 0.20.0 > > Attachments: HADOOP-4577-v1.patch > > > It is extremely convenient to be able to deploy JNI libraries utilized in a > custom map-reduce job via the job's JAR file. The TaskRunner already > establishes a precedent by automatically adding any jar files contained in > the "lib" directory of the job jar to the child map/reduce process's > classpath. Following this convention, it should also be possible to deploy > custom JNI libraries in the same lib directory. This involves adding the path > to the job jar's lib directory to the VM's library.path setting (after the > jar has been expanded in the job cache directory). This does not elimintate > the need add dependent shared libraries that may be referenced by the JNI > libraries to the system's LD_LIBRARY_PATH variable. In our deployment > configuration, we usually pre-install third party shared libraries across the > cluster and only deploy our custom JNI libraries via the job jar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.