Looking at the following error : 2010-04-15 17:26:09,746 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201004151709_0001_m_000001_3: java.io.IOException: Cannot open filename /srv/hadoop/data/hadoop/mapred/local/taskTracker/archive/hadoop-eventlog01.socialmedia.com/user/knuttycombe/socialmedia.mr_tool.serfile/c363f0f6-28ac-4365-ba93-fec6e5188741.ser/c363f0f6-28ac-4365-ba93-fec6e5188741.ser at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1497) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1488) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:376) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:178) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356) at socialmedia.common.hadoop.DistCacheResources$$anonfun$init$2$$anonfun$apply$3.apply(DistCacheResources.scala:54) at socialmedia.common.hadoop.DistCacheResources$$anonfun$init$2$$anonfun$apply$3.apply(DistCacheResources.scala:54) at socialmedia.common.util.Util$.using(Util.scala:20) at socialmedia.common.hadoop.DistCacheResources$$anonfun$init$2.apply(DistCacheResources.scala:53) at socialmedia.common.hadoop.DistCacheResources$$anonfun$init$2.apply(DistCacheResources.scala:52) at scala.Option.map(Option.scala:70) at socialmedia.common.hadoop.DistCacheResources$class.init(DistCacheResources.scala:51) at socialmedia.somegra.reporting.SeriesMetricsMapper.init(HDFSMetricsQuery.scala:185) at socialmedia.somegra.reporting.SeriesMetricsMapper.setup(HDFSMetricsQuery.scala:192) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)
You map task is trying to read the file from DFS. This file will be in local FileSystem. Your code below should create FileSystem using FileSystem.getLocal(conf) or you can directly use java.io.File to access the file. resPath => using(resPath.getFileSystem(conf)) { fs => using(fs.open(resPath)) { Hope this helps you. Thanks Amareshwari On 4/15/10 11:36 PM, "Kris Nuttycombe" <kris.nuttyco...@gmail.com> wrote: Hi, all, I'm having problems with my Mapper instances accessing the DistributedCache. A bit of background: I'm running on a single-node cluster, just trying to get my first map/reduce job functioning. Both the job tracker and the primary namenode exist on the same host. In the client, I am able to successfully add a file to the distributed cache, but when my Mapper instance attempts to read the file it fails, despite the fact that the path it fails on exists on the system where the job is running. Here is a paste detailing the code where the error is occurring, related log output from the node where the job runs, and filesystem information from the same: http://paste.pocoo.org/show/202242/ The failure appears to be originating from these lines in DFSClient.java LocatedBlocks newInfo = callGetBlockLocations(namenode, src, 0, prefetchSize); if (newInfo == null) { throw new IOException("Cannot open filename " + src); } I've attempted to trace back through the code to try to figure out why newInfo might be null, but I quickly got lost. Can someone please help me figure out why it can't find this file? Thank you, Kris