To push the file to HDFS (put it in the 'a_hdfsDirectory' directory) Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path srcPath = new Path(a_directory + "/" + outputName); Path dstPath = new Path(a_hdfsDirectory + "/" + outputName); hdfs.copyFromLocalFile(srcPath, dstPath);
to read it from HDFS in your mapper or reducer: Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path cachePath= new Path(a_hdfsDirectory + "/" + outputName); BufferedReader wordReader = new BufferedReader( new FileReader(cachePath.toString())); On Fri, Jun 26, 2009 at 8:55 PM, akhil1988 <akhilan...@gmail.com> wrote: > > Thanks Chris for your reply! > > Well, I could not understand much of what has been discussed on that forum. > I am unaware of Cascading. > > My problem is simple - I want a directory to present in the local working > directory of tasks so that I can access it from my map task in the > following > manner : > > FileInputStream fin = new FileInputStream("Config/file1.config"); > > where, > Config is a directory which contains many files/directories, one of which > is > file1.config > > It would be helpful to me if you can tell me what statements to use to > distribute a directory to the tasktrackers. > The API doc http://hadoop.apache.org/core/docs/r0.20.0/api/index.html says > that archives are unzipped on the tasktrackers but I want an example of how > to use this in case of a dreictory. > > Thanks, > Akhil > > > > Chris Curtin-2 wrote: > > > > Hi, > > > > I've found it much easier to write the file to HDFS use the API, then > pass > > the 'path' to the file in HDFS as a property. You'll need to remember to > > clean up the file after you're done with it. > > > > Example details are in this thread: > > > http://groups.google.com/group/cascading-user/browse_thread/thread/d5c619349562a8d6# > > > > Hope this helps, > > > > Chris > > > > On Thu, Jun 25, 2009 at 4:50 PM, akhil1988 <akhilan...@gmail.com> wrote: > > > >> > >> Please ask any questions if I am not clear above about the problem I am > >> facing. > >> > >> Thanks, > >> Akhil > >> > >> akhil1988 wrote: > >> > > >> > Hi All! > >> > > >> > I want a directory to be present in the local working directory of the > >> > task for which I am using the following statements: > >> > > >> > DistributedCache.addCacheArchive(new > URI("/home/akhil1988/Config.zip"), > >> > conf); > >> > DistributedCache.createSymlink(conf); > >> > > >> >>> Here Config is a directory which I have zipped and put at the given > >> >>> location in HDFS > >> > > >> > I have zipped the directory because the API doc of DistributedCache > >> > (http://hadoop.apache.org/core/docs/r0.20.0/api/index.html) says that > >> the > >> > archive files are unzipped in the local cache directory : > >> > > >> > DistributedCache can be used to distribute simple, read-only data/text > >> > files and/or more complex types such as archives, jars etc. Archives > >> (zip, > >> > tar and tgz/tar.gz files) are un-archived at the slave nodes. > >> > > >> > So, from my understanding of the API docs I expect that the Config.zip > >> > file will be unzipped to Config directory and since I have SymLinked > >> them > >> > I can access the directory in the following manner from my map > >> function: > >> > > >> > FileInputStream fin = new FileInputStream("Config/file1.config"); > >> > > >> > But I get the FileNotFoundException on the execution of this > statement. > >> > Please let me know where I am going wrong. > >> > > >> > Thanks, > >> > Akhil > >> > > >> > >> -- > >> View this message in context: > >> http://www.nabble.com/Using-addCacheArchive-tp24207739p24210836.html > >> Sent from the Hadoop core-user mailing list archive at Nabble.com. > >> > >> > > > > > > -- > View this message in context: > http://www.nabble.com/Using-addCacheArchive-tp24207739p24229338.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > >