Hello all:

We've had an intermittent issue on our cluster when using the distributed cache:

11/01/25 13:46:19 INFO mapred.JobClient: Task Id :
attempt_201101071032_13017_r_000030_2, Status : FAILED
java.io.FileNotFoundException:
/hadoop.data.1/tmp/mapred/local/taskTracker/archive/hdfs/data/lookup/ipclass/classification_regex.txt/classification_regex.txt
(No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(Unknown Source)
        at java.io.FileInputStream.<init>(Unknown Source)
        at com.returnpath.tusko.IPClassifier.loadRules(IPClassifier.java:725)
        at 
com.returnpath.tusko.IPClassification$Reduce.loadClassifier(IPClassification.java:193)
        at 
com.returnpath.tusko.IPClassification$Reduce.setup(IPClassification.java:313)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

where our code is referencing
"DistributedCache.getLocalCacheFiles(job)" and eventually "new
FileInputStream(path.toString());"

We've been able to work-around the issue when it has occurred by either
1) restarting the task-trackers, or
2) deleting and recreating the file to be cached on hdfs.

Does anyone have any idea what the root cause of the problem might be?

Thank you,

Jacob Rideout

Reply via email to