Re: Faster alternative to FSDataInputStream

Ananth T. Sarathy Tue, 18 Aug 2009 11:35:23 -0700

it 's not called on anything. Using s3. the write method is
public void writeToFile(String fileToWrite, InputStream in)
            throws IOException
    {
        Path outFile = new Path(fileToWrite);
        outFile.makeQualified(outFile.getFileSystem(conf));
        System.out.println(outFile);
        System.out.println(outFile.toUri());
        if (fs.exists(outFile))
            throw new HadoopFileException("Specified file  " + fileToWrite
                    + " already exists.");


        // Open outFile for writing.

        FSDataOutputStream out = fs.create(outFile);

        // Read from input stream and write to output stream until EOF.
        pipe(in, out);
        // Close the streams when done.
        System.out.println(System.currentTimeMillis()+" Closing file");
        out.close();
        System.out.println(System.currentTimeMillis()+" Closing input");

        in.close();
        System.out.println(System.currentTimeMillis()+" Done Read file");

    }

maybe i am going at this wrong
Ananth T Sarathy


On Tue, Aug 18, 2009 at 1:18 PM, Qin Gao <[email protected]> wrote:

> Is the code called on Mapper/Reducer? If so probably DistributedCache is a
> better solution
> --Q
>
>
> On Tue, Aug 18, 2009 at 12:00 PM, Ananth T. Sarathy <
> [email protected]> wrote:
>
> > I am trying to download binary files stored in Hadoop but there is like a
> 2
> > minute wait on a 20mb file when I try to execute the in.read(buf).
> >
> > is there a better way to be doing this?
> >
> >    private void pipe(InputStream in, OutputStream out) throws IOException
> >    {    System.out.println(System.currentTimeMillis()+" Starting to Pipe
> > Data");
> >        byte[] buf = new byte[1024];
> >        int read = 0;
> >        while ((read = in.read(buf)) >= 0)
> >        {
> >            out.write(buf, 0, read);
> >            System.out.println(System.currentTimeMillis()+" Piping Data");
> >        }
> >        out.flush();
> >        System.out.println(System.currentTimeMillis()+" Finished Piping
> > Data");
> >
> >    }
> >
> > public void readFile(String fileToRead, OutputStream out)
> >            throws IOException
> >    {
> >        System.out.println(System.currentTimeMillis()+" Start Read File");
> >        Path inFile = new Path(fileToRead);
> >        System.out.println(System.currentTimeMillis()+" Set Path");
> >        // Validate the input/output paths before reading/writing.
> >
> >        if (!fs.exists(inFile))
> >        {
> >            throw new HadoopFileException("Specified file  " + fileToRead
> >                    + " not found.");
> >        }
> >        if (!fs.isFile(inFile))
> >        {
> >            throw new HadoopFileException("Specified file  " + fileToRead
> >                    + " not found.");
> >        }
> >        // Open inFile for reading.
> >        System.out.println(System.currentTimeMillis()+" Opening Data
> > Stream");
> >        FSDataInputStream in = fs.open(inFile);
> >
> >        System.out.println(System.currentTimeMillis()+" Opened Data
> > Stream");
> >        // Open outFile for writing.
> >
> >        // Read from input stream and write to output stream until EOF.
> >        pipe(in, out);
> >
> >        // Close the streams when done.
> >        out.close();
> >        in.close();
> >    }
> > Ananth T Sarathy
> >
>

Re: Faster alternative to FSDataInputStream

Reply via email to