Hello, >From the docs (for 0.20) for DistributedCache [1] I'm under the impression that .tgz files will be unzipped,untarred and symlinked into the jobs current dir
However, when running the job, this little fragment[2] reveals ( i have called DistributedCache.createSymlink(config_); just after adding the cache components) Arch=/data01/hadoop/mapred/mapred/taskTracker/distcache/5775566659502863353_-129792898_530471609/a.X.com/user/sguha/tmp/rhipe-hbase.jar Arch=/data01/hadoop/mapred/mapred/taskTracker/distcache/5324957355881422466_25039836_529778096/a.X.com/user/sguha/Rdist.tar.gz File=/data01/hadoop/mapred/mapred/taskTracker/distcache/1213508244132138160_-278348214_531319237/a.X.com/user/sguha/mscript.sh But having inspected the ls -r of the working directory , I dont see this happening (only mscipt.sh was symlinked, it was added via addCacheFile) ls -lR .: total 12 lrwxrwxrwx 1 mapred mapred 90 Apr 28 22:11 job.jar -> /data01/hadoop/mapred/mapred/taskTracker/sguha/jobcache/job_201102231451_6814/jars/job.jar lrwxrwxrwx 1 mapred mapred 141 Apr 28 22:11 mscript.sh -> /data01/hadoop/mapred/mapred/taskTracker/distcache/1213508244132138160_-278348214_531319237/a.X.com/user/sguha/mscript.sh drwxr-xr-x 2 mapred mapred 4096 Apr 28 22:11 tmp ./tmp: total 0 In summary: - I added via addCacheFile (mscript.sh) - symlinked into working directory. OK - I added a JAR file with some classes I needed - added using addArchiveToClassPath and this worked too - OK - I added a tgz file hoping it would be untarred, unzipped and symlinked in current folder (using addCacheArchive) - NOT-OK Have I missed anything? Cheers Joy [1] http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/filecache/DistributedCache.html [2] Path[] localArchives = DistributedCache.getLocalCacheArchives(context.getConfiguration()); Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration()); for(Path p : localArchives) System.out.println("Arch="+p); for(Path p : localFiles) System.out.println("File="+p);
