[ 
https://issues.apache.org/jira/browse/HADOOP-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654877#action_12654877
 ] 

Ahad Rana commented on HADOOP-4577:
-----------------------------------

Hi Arun,

Sorry, I have been tied up with other stuff and have not been able to bring 
this issue to closure. Will try to submit a patch based on previous suggestions 
shortly. 

As far as Distributed Cache vs. distributing via the Jar: In my experience, 
both methods have their valid uses. In our use case at CommonCrawl, we deploy a 
single jar that contains all of our map-reduce jobs to a master server, which 
then executes specific jobs on demand. We have various utility classes that are 
wrappers around native C/C++ libraries. We build these JNI wrappers and JNI 
libraries via the same build script that builds the jar. It is super convenient 
to be able to include and deploy the related JNI libraries within the jar (and 
thus have them available at each mapper/reducer node). This way, all of our 
various jobs can use these classes seamlessly without relying on any special 
JOB SPECIFIC setup (such as adding the appropriate JNI libraries to the 
Distributed Cache). 

So, in conclusion, Distributed Cache is good for cases where library 
availability is determined by job config, mapred.child.java.opts is convenient 
for scenarios where a set of (relatively static) libraries are part of the 
standard cluster config, and the third method I am proposing, deployment via 
jar, is convenient for scenarios where a deployment jar contains more than one 
job, and library availability is desired across all jobs. Sound right ? 


Ahad.

> Add Jar "lib" directory to TaskRunner's library.path setting to allow JNI 
> libraries to be deployed via JAR file  
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4577
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4577
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.1
>         Environment: Hadoop 18.1 Cluster with custom JNI shared libraries 
> deployed in lib directory of deployment JAR.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4577-v1.patch
>
>
> It is extremely convenient to be able to deploy JNI libraries utilized in a 
> custom map-reduce job via the job's JAR file. The TaskRunner already 
> establishes a precedent by automatically adding any jar files contained in 
> the "lib" directory of the job jar to the child map/reduce process's 
> classpath. Following this convention, it should also be possible to deploy 
> custom JNI libraries in the same lib directory. This involves adding the path 
> to the job jar's lib directory to the VM's library.path setting (after the 
> jar has been expanded in the job cache directory). This does not elimintate 
> the need add dependent shared libraries that may be referenced by the JNI 
> libraries to the system's LD_LIBRARY_PATH variable. In our deployment 
> configuration, we usually pre-install third party shared libraries across the 
> cluster and only deploy our custom JNI libraries via the job jar.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to