[ https://issues.apache.org/jira/browse/CRUNCH-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan Blue updated CRUNCH-512: ----------------------------- Description: I have a Crunch-based task that uses the LocalJobRunner when it is copying data to or from HDFS. When HDFS is the default FS and I'm using LocalJobRunner, I get a FileNotFoundException in the distributed cache code with a strange looking URI: {code} 1 job failure(s) occurred: org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/eb3b9643-1be3-400e-96d1-6e095fb3c5... ID=1 (1/1)(1): java.io.FileNotFoundException: File does not exist: hdfs://quickstart.cloudera:8020/usr/lib/hive/lib/antlr-runtime-3.4.jar at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:267) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:388) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:481) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238) at org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112) at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55) at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83) at java.lang.Thread.run(Thread.java:745) {code} It looks similar to HDFS-7031, where file:/ paths were getting incorrectly qualified using HDFS information. was: I have a Crunch-based task that uses the LocalJobRunner when it is copying data to or from HDFS. When HDFS is the default FS and I'm using LocalJobRunner, I get a FileNotFoundException in the distributed cache code with a strange looking URI: {code} 1 job failure(s) occurred: org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/eb3b9643-1be3-400e-96d1-6e095fb3c5... ID=1 (1/1)(1): java.io.FileNotFoundException: File does not exist: hdfs://quickstart.cloudera:8020/usr/lib/hive/lib/antlr-ru ntime-3.4.jar at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:267) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:388) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:481) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238) at org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112) at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55) at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83) at java.lang.Thread.run(Thread.java:745) {code} It looks similar to HDFS-7031, where file:/ paths were getting incorrectly qualified using HDFS information. > Distributed cache cannot find jar files when using LocalJobRunner. > ------------------------------------------------------------------ > > Key: CRUNCH-512 > URL: https://issues.apache.org/jira/browse/CRUNCH-512 > Project: Crunch > Issue Type: Bug > Components: Core > Affects Versions: 0.11.0 > Reporter: Ryan Blue > Assignee: Josh Wills > > I have a Crunch-based task that uses the LocalJobRunner when it is copying > data to or from HDFS. When HDFS is the default FS and I'm using > LocalJobRunner, I get a FileNotFoundException in the distributed cache code > with a strange looking URI: > {code} > 1 job failure(s) occurred: > org.kitesdk.tools.CopyTask: > Kite(dataset:file:/tmp/eb3b9643-1be3-400e-96d1-6e095fb3c5... ID=1 (1/1)(1): > java.io.FileNotFoundException: File does not exist: > hdfs://quickstart.cloudera:8020/usr/lib/hive/lib/antlr-runtime-3.4.jar > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) > at > org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288) > at > org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224) > at > org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99) > at > org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57) > at > org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:267) > at > org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:388) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:481) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292) > at > org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329) > at > org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204) > at > org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238) > at > org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112) > at > org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55) > at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83) > at java.lang.Thread.run(Thread.java:745) > {code} > It looks similar to HDFS-7031, where file:/ paths were getting incorrectly > qualified using HDFS information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)