DistributedCache parses Paths with sheme or port components incorrectly
-----------------------------------------------------------------------

                 Key: HADOOP-3800
                 URL: https://issues.apache.org/jira/browse/HADOOP-3800
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.17.1, 0.17.0
         Environment: linux ("path.separator" is ":")
hdfs filesystem (not "local")
            Reporter: Andrew Gudkov


When passing paths with scheme or port components set up (like 
"hdfs://localhost:9000/deploy/hello") to DistributedCache.addFileToClassPath, 
they are appended to configuration option "mapred.job.classpath.files" using 
delimeter "path.separator", which is ":".
This misleads DistributedCache.getFileClassPath: same symbol is used to 
delimete parts of Path and whole paths.


Example:
I have some jars and conf-files in hdfs directory "/deploy". Next code adds 
them to job's classpath:
{code:title=Test.java}
     Path deployPath = new Path("/deploy");
      FileSystem fs = deployPath.getFileSystem(new Configuration());

      FileStatus[] jars = fs.listStatus(deployPath);
      for (int i = 0; i < jars.length; i++) {
        System.out.println(jars[i].getPath());
        DistributedCache.addFileToClassPath(jars[i].getPath(), job);
      }
{code}

Launhing task gives stdout output:
{code}
hdfs://localhost:9000/deploy/hello
{code}
And "mapred.job.classpath.files" is set to "hdfs://localhost:9000/deploy/hello" 
by DistributedCache.
And DistributedCache.getFileClassPaths returns incorrect paths like 
"9000/deploy/hello/home/gudok/Work/test/bin/../conf".

For now, I've solved this problem by submitting Paths without scheme and port 
("/deploy/hello").

Other DistributedCache methods need to be reviewed to.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to