The path separator is a major issue with a number of items in the configuration data set that are multiple items packed together via the path separator.
the class path
the distributed cache
the input path set

all suffer from the path.separator issue for 2 reasons:
1 being the difference across jvms as indicated in the previous email item (I had missed this!) 2 separator characters that happen to be embedded in the individual elements are not escaped before the item is added to the existing set.

For all of the pain we have with these packed items, it may be simpler to serialize a List<String> for multi element items rather than packing them with the path.separator system property item.



Aaron Kimball wrote:
Hi Stuart,

Good sleuthing out that problem :) The correct way to submit patches is to
file a ticket on JIRA (https://issues.apache.org/jira/browse/HADOOP). Create
an account, create a new issue describing the bug, and then attach the patch
file. There'll be a discussion there and others can review your patch and
include it in the codebase.

Cheers,
- Aaron

On Fri, Dec 12, 2008 at 12:14 PM, Stuart White <[email protected]>wrote:

Ok, I'll answer my own question.

This is caused by the fact that hadoop uses
system.getProperty("path.separator") as the delimiter in the list of
jar files passed via -libjars.

If your job spans platforms, system.getProperty("path.separator")
returns a different delimiter on the different platforms.

My solution is to use a comma as the delimiter, rather than the
path.separator.

I realize comma is, perhaps, a poor choice for a delimiter because it
is valid in filenames on both Windows and Linux, but the -libjars uses
it as the delimiter when listing the additional required jars.  So, I
figured if it's already being used as a delimiter, then it's
reasonable to use it internally as well.

I've attached a patch (against 0.19.0) that applies this change.

Now, with this change, I can submit hadoop jobs (requiring multiple
supporting jars) from my Windows laptop (via cygwin) to my 10-node
Linux hadoop cluster.

Any chance this change could be applied to the hadoop codebase?


Reply via email to