I am trying to download binary files stored in Hadoop but there is like a 2
minute wait on a 20mb file when I try to execute the in.read(buf).
is there a better way to be doing this?
private void pipe(InputStream in, OutputStream out) throws IOException
{ System.out.println(System.currentTimeMillis()+" Starting to Pipe
Data");
byte[] buf = new byte[1024];
int read = 0;
while ((read = in.read(buf)) >= 0)
{
out.write(buf, 0, read);
System.out.println(System.currentTimeMillis()+" Piping Data");
}
out.flush();
System.out.println(System.currentTimeMillis()+" Finished Piping
Data");
}
public void readFile(String fileToRead, OutputStream out)
throws IOException
{
System.out.println(System.currentTimeMillis()+" Start Read File");
Path inFile = new Path(fileToRead);
System.out.println(System.currentTimeMillis()+" Set Path");
// Validate the input/output paths before reading/writing.
if (!fs.exists(inFile))
{
throw new HadoopFileException("Specified file " + fileToRead
+ " not found.");
}
if (!fs.isFile(inFile))
{
throw new HadoopFileException("Specified file " + fileToRead
+ " not found.");
}
// Open inFile for reading.
System.out.println(System.currentTimeMillis()+" Opening Data
Stream");
FSDataInputStream in = fs.open(inFile);
System.out.println(System.currentTimeMillis()+" Opened Data
Stream");
// Open outFile for writing.
// Read from input stream and write to output stream until EOF.
pipe(in, out);
// Close the streams when done.
out.close();
in.close();
}
Ananth T Sarathy