[ 
https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469327
 ] 

Dennis Kubes commented on HADOOP-964:
-------------------------------------

The second patch classpath2.path (sorry should be patch) attacks the problem 
from the ReduceTaskRunner instead of the hadoop shell script.  The problem is 
that Writable classes are not being found by the ReduceTaskRunner upon 
initialization.  It needs these Writable classes to perform sorting, etc in the 
prepare stage.  The first solution was to change the hadoop script to load any 
jars in the HADOOP_HOME.  The hadoop script sets the classpath for the 
TaskTracker which is then passed to the ReduceTaskRunner and therefore by 
loading any jars in the home directory the necessary jars would be in the 
classpath and accessible.  There are a few issues with that fix.  First this 
reverses HADOOP-700 which we don't want to do.  Second is we went down this 
path of setting classpath through the script for Writable classes then anytime 
new classes were added we would have to restart the TaskTracker nodes.  Again 
not a good solution.

So instead what I did with this patch is to change the ReduceTaskRunner to 
dynamically configure it classpath from the local unjarred work directory.  It 
does this through creating a new URLClassLoader and adding in the same elements 
that are added to classpath of the TaskTracker$Child spawns while keeping the 
old context class loader as its parent.  The new URLClassLoader is then set 
into the current JobConf as its classloader and it is used for the sorting, 
etc.  This allows us to not have to change the hadoop script and two allows new 
writable classes to by dynamically added to the system without restarting 
TaskTracker nodes.

I have run this patch on a development system using the Nutch injector as well 
as ran the TestMapRed unit tests.  Both completed sucessfully.

> Hadoop Shell Script causes ClassNotFoundException for Nutch processes
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-964
>                 URL: https://issues.apache.org/jira/browse/HADOOP-964
>             Project: Hadoop
>          Issue Type: Bug
>          Components: scripts
>         Environment: windows xp and fedora core 6 linux, java 1.5.10...should 
> affect all systems
>            Reporter: Dennis Kubes
>            Priority: Critical
>             Fix For: 0.11.0
>
>         Attachments: classpath.patch, classpath2.path
>
>
> In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts 
> to get the map output key and value classes from the configuration object.  
> This is before the TaskTracker$Child process is spawned off into into own 
> separate JVM so here the classpath for the configuration is the classpath 
> that started the TaskTracker.  The current hadoop script includes the hadoop 
> jars, meaning that any hadoop writable type will be found, but it doesn't 
> include nutch jars  so any nutch writable type or any other writable type 
> will not be found and will throw a ClassNotFoundException.
> I don't think it is a good idea to have a dependecy on specific Nutch jars in 
> the Hadoop script but it is a good idea to allow jars to be included if they 
> are in specific locations, such as the HADOOP_HOME where the nutch jar 
> resides.  I have attached a patch that adds any jars in the HADOOP_HOME 
> directory to the hadoop classpath.  This fixes the issues with getting 
> ClassNotFoundExceptions inside of Nutch processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to