I added a DistributedCache.createSymlink(configuration) call right after the addCacheArcihve() call, but see the same error.
On Mon, Jan 9, 2012 at 11:05 AM, Alejandro Abdelnur <[email protected]>wrote: > Bill, > > In addition you must call DistributedCached.createSymlink(configuration), > that should do. > > Thxs. > > Alejandro > > On Mon, Jan 9, 2012 at 10:30 AM, W.P. McNeill <[email protected]> wrote: > > > I am trying to add a zip file to the distributed cache and have it > unzipped > > on the task nodes with a softlink to the unzipped directory placed in the > > working directory of my mapper process. I think I'm doing everything the > > way the documentation tells me to, but it's not working. > > > > On the client in the run() function while I'm creating the job I first > > call: > > > > fs.copyFromLocalFile("gate-app.zip", "/tmp/gate-app.zip"); > > > > As expected, this copies the archive file gate-app.zip to the HDFS > > directory /tmp. > > > > Then I call > > > > DistributedCache.addCacheArchive("/tmp/gate-app.zip#gate-app", > > configuration); > > > > I expect this to add "/tmp/gate-app.zip" to the distributed cache and > put a > > softlink to it called gate-app in the working directory of each task. > > However, when I call job.waitForCompletion(), I see the following error: > > > > Exception in thread "main" java.io.FileNotFoundException: File does not > > exist: /tmp/gate-app.zip#gate-app. > > > > It appears that the distributed cache mechanism is interpreting the > entire > > URI as the literal name of the file, instead of treating the fragment as > > the name of the softlink. > > > > As far as I can tell, I'm doing this correctly according to the API > > documentation: > > > > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html > > . > > > > The full project in which I'm doing this is up on github: > > https://github.com/wpm/Hadoop-GATE. > > > > Can someone tell me what I'm doing wrong? > > >
