Re: Faster alternative to FSDataInputStream

Qin Gao Tue, 18 Aug 2009 10:18:59 -0700

Is the code called on Mapper/Reducer? If so probably DistributedCache is a
better solution
--Q



On Tue, Aug 18, 2009 at 12:00 PM, Ananth T. Sarathy <
[email protected]> wrote:

> I am trying to download binary files stored in Hadoop but there is like a 2
> minute wait on a 20mb file when I try to execute the in.read(buf).
>
> is there a better way to be doing this?
>
>    private void pipe(InputStream in, OutputStream out) throws IOException
>    {    System.out.println(System.currentTimeMillis()+" Starting to Pipe
> Data");
>        byte[] buf = new byte[1024];
>        int read = 0;
>        while ((read = in.read(buf)) >= 0)
>        {
>            out.write(buf, 0, read);
>            System.out.println(System.currentTimeMillis()+" Piping Data");
>        }
>        out.flush();
>        System.out.println(System.currentTimeMillis()+" Finished Piping
> Data");
>
>    }
>
> public void readFile(String fileToRead, OutputStream out)
>            throws IOException
>    {
>        System.out.println(System.currentTimeMillis()+" Start Read File");
>        Path inFile = new Path(fileToRead);
>        System.out.println(System.currentTimeMillis()+" Set Path");
>        // Validate the input/output paths before reading/writing.
>
>        if (!fs.exists(inFile))
>        {
>            throw new HadoopFileException("Specified file  " + fileToRead
>                    + " not found.");
>        }
>        if (!fs.isFile(inFile))
>        {
>            throw new HadoopFileException("Specified file  " + fileToRead
>                    + " not found.");
>        }
>        // Open inFile for reading.
>        System.out.println(System.currentTimeMillis()+" Opening Data
> Stream");
>        FSDataInputStream in = fs.open(inFile);
>
>        System.out.println(System.currentTimeMillis()+" Opened Data
> Stream");
>        // Open outFile for writing.
>
>        // Read from input stream and write to output stream until EOF.
>        pipe(in, out);
>
>        // Close the streams when done.
>        out.close();
>        in.close();
>    }
> Ananth T Sarathy
>

Re: Faster alternative to FSDataInputStream

Reply via email to