Also, I just want to clear... the delay seems to at the intial (read = in.read(buf))
after the first time into the loop it flies... Ananth T Sarathy On Wed, Aug 19, 2009 at 1:58 PM, Raghu Angadi <[email protected]> wrote: > Edward Capriolo wrote: > >> On Wed, Aug 19, 2009 at 11:11 AM, Edward Capriolo <[email protected] >>> >wrote: >>> >>> It would be as fast as underlying filesystem goes. >>>>>> >>>>> I would not agree with that statement. There is overhead. >>>> >>> > You might be misinterpreting my comment. There is of course some over head > (at the least the procedure calls).. depending on you underlying filesystem, > there could be extra buffer copies and CRC overhead. But none of that > explains transfer as slow as 1 MBps (if my interpretation of of results is > correct). > > Raghu. > > > In some testing I did writing a small file can >>>> take 30-300 ms. So if you have 9000 small files (like I did) and you >>>> are single threaded this takes a long time. >>>> >>>> If you orchestrate your task to use FSDataInput and FSDataOutput in >>>> the map or reduce phase then each mapper or reducer is writing a file >>>> at a time. Now that is fast. >>>> >>>> Ananth, are you doing your r/w inside a map/reduce job or are you just >>>> using FS* in a top down program? >>>> >>>> >>>> >>>> On Wed, Aug 19, 2009 at 1:26 AM, Raghu Angadi<[email protected]> >>>> wrote: >>>> >>>>> Ananth T. Sarathy wrote: >>>>> >>>>>> I am trying to download binary files stored in Hadoop but there is >>>>>> like >>>>>> >>>>> a >>>> >>>>> 2 >>>>>> minute wait on a 20mb file when I try to execute the in.read(buf). >>>>>> >>>>> What does this mean : 2 min to pipe 20mb or one or your one of the >>>>> >>>> in.read() >>>> >>>>> calls took 2 minutes? Your code actually measures team for read and >>>>> >>>> write. >>>> >>>>> There is nothing in FSInputstream to cause this slow down. Do you think >>>>> anyone would use Hadoop otherwise? It would be as fast as underlying >>>>> filesystem goes. >>>>> >>>>> Raghu. >>>>> >>>>> is there a better way to be doing this? >>>>>> >>>>>> private void pipe(InputStream in, OutputStream out) throws >>>>>> >>>>> IOException >>>> >>>>> { System.out.println(System.currentTimeMillis()+" Starting to Pipe >>>>>> Data"); >>>>>> byte[] buf = new byte[1024]; >>>>>> int read = 0; >>>>>> while ((read = in.read(buf)) >= 0) >>>>>> { >>>>>> out.write(buf, 0, read); >>>>>> System.out.println(System.currentTimeMillis()+" Piping >>>>>> >>>>> Data"); >>>> >>>>> } >>>>>> out.flush(); >>>>>> System.out.println(System.currentTimeMillis()+" Finished Piping >>>>>> Data"); >>>>>> >>>>>> } >>>>>> >>>>>> public void readFile(String fileToRead, OutputStream out) >>>>>> throws IOException >>>>>> { >>>>>> System.out.println(System.currentTimeMillis()+" Start Read >>>>>> >>>>> File"); >>>> >>>>> Path inFile = new Path(fileToRead); >>>>>> System.out.println(System.currentTimeMillis()+" Set Path"); >>>>>> // Validate the input/output paths before reading/writing. >>>>>> >>>>>> if (!fs.exists(inFile)) >>>>>> { >>>>>> throw new HadoopFileException("Specified file " + >>>>>> fileToRead >>>>>> + " not found."); >>>>>> } >>>>>> if (!fs.isFile(inFile)) >>>>>> { >>>>>> throw new HadoopFileException("Specified file " + >>>>>> fileToRead >>>>>> + " not found."); >>>>>> } >>>>>> // Open inFile for reading. >>>>>> System.out.println(System.currentTimeMillis()+" Opening Data >>>>>> Stream"); >>>>>> FSDataInputStream in = fs.open(inFile); >>>>>> >>>>>> System.out.println(System.currentTimeMillis()+" Opened Data >>>>>> Stream"); >>>>>> // Open outFile for writing. >>>>>> >>>>>> // Read from input stream and write to output stream until EOF. >>>>>> pipe(in, out); >>>>>> >>>>>> // Close the streams when done. >>>>>> out.close(); >>>>>> in.close(); >>>>>> } >>>>>> Ananth T Sarathy >>>>>> >>>>>> >>>>> >
