-libjars with multiple jars broken when client and cluster reside on different OSs ----------------------------------------------------------------------------------
Key: HADOOP-4864 URL: https://issues.apache.org/jira/browse/HADOOP-4864 Project: Hadoop Core Issue Type: Bug Components: filecache Affects Versions: 0.19.0 Environment: When your hadoop job spans OSs. Reporter: Stuart White Priority: Minor When submitting a hadoop job from Windows (Cygwin) to a Linux hadoop cluster (or vice versa), and when you specify multiple additional jar files via the -libjars flag, hadoop throws a ClassNotFoundException for any classes located in the additional jars specified via the -libjars flag. This is caused by the fact that hadoop uses system.getProperty("path.separator") as the delimiter in the list of jar files passed via -libjars. If your job spans platforms, system.getProperty("path.separator") returns a different delimiter on the different platforms. My suggested solution is to use a comma as the delimiter, rather than the path.separator. I realize comma is, perhaps, a poor choice for a delimiter because it is valid in filenames on both Windows and Linux, but the -libjars flag uses it as the delimiter when listing the additional required jars. So, I figured if it's already being used as a delimiter, then it's reasonable to use it internally as well. I have a patch that applied my suggested change, but I don't see anywhere so upload it. So, I'll go ahead and create this JIRA and hope that I will have the opportunity to add a patch later. Now, with this change, I can submit hadoop jobs (requiring multiple supporting jars) from my Windows laptop (via cygwin) to my 10-node Linux hadoop cluster. Any chance this change could be applied to the hadoop codebase? To recreate the problem I'm seeing, do the following: - Setup a hadoop cluster on linux - Perform the remaining steps on cygwin, with a hadoop installation configured to point to the linux cluster. (set fs.default.name and mapred.job.tracker) - Extract the tarball. Change into created directory. tar xvfz Example.tar.gz cd Example - Edit build.properties, set your hadoop.home appropriately, then build the example. ant - Load the file Example.in into your dfs hadoop dfs -copyFromLocal Example.in Example.in - Execute the provided shell script, passing it testID 1. ./Example.sh 1 This test does not use -libjars, and it completes successfully. - Next, execute testID 2. ./Example.sh 2 This test uses -libjars with 1 jarfile (Foo.jar), and it completes successfully. - Next, execute testID 3. ./Example.sh 3 This test uses -libjars with 1 jarfile (Bar.jar), and it completes successfully. - Next, execute testID 4. ./Example.sh 4 This test uses -libjars with 2 jarfiles (Foo.jar and Bar.jar), and it fails with a ClassNotFoundException. This behavior only occurs when calling from cygwin to linux or vice versa. If both the cluster and the client reside on either linux or cygwin, the problem does not occur. I'm continuing to dig to see what I can figure out, but since I'm very new to hadoop (started using it this week), I thought I'd go ahead and throw this out there to see if anyone can help. Thanks! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.