Is the code called on Mapper/Reducer? If so probably DistributedCache is a better solution --Q
On Tue, Aug 18, 2009 at 12:00 PM, Ananth T. Sarathy < [email protected]> wrote: > I am trying to download binary files stored in Hadoop but there is like a 2 > minute wait on a 20mb file when I try to execute the in.read(buf). > > is there a better way to be doing this? > > private void pipe(InputStream in, OutputStream out) throws IOException > { System.out.println(System.currentTimeMillis()+" Starting to Pipe > Data"); > byte[] buf = new byte[1024]; > int read = 0; > while ((read = in.read(buf)) >= 0) > { > out.write(buf, 0, read); > System.out.println(System.currentTimeMillis()+" Piping Data"); > } > out.flush(); > System.out.println(System.currentTimeMillis()+" Finished Piping > Data"); > > } > > public void readFile(String fileToRead, OutputStream out) > throws IOException > { > System.out.println(System.currentTimeMillis()+" Start Read File"); > Path inFile = new Path(fileToRead); > System.out.println(System.currentTimeMillis()+" Set Path"); > // Validate the input/output paths before reading/writing. > > if (!fs.exists(inFile)) > { > throw new HadoopFileException("Specified file " + fileToRead > + " not found."); > } > if (!fs.isFile(inFile)) > { > throw new HadoopFileException("Specified file " + fileToRead > + " not found."); > } > // Open inFile for reading. > System.out.println(System.currentTimeMillis()+" Opening Data > Stream"); > FSDataInputStream in = fs.open(inFile); > > System.out.println(System.currentTimeMillis()+" Opened Data > Stream"); > // Open outFile for writing. > > // Read from input stream and write to output stream until EOF. > pipe(in, out); > > // Close the streams when done. > out.close(); > in.close(); > } > Ananth T Sarathy >
