Re: HDFS random read performance vs sequential read performance ?

Martin Mituzas Mon, 20 Jul 2009 02:28:06 -0700

Waiting for response...
Thanks in advance.



Martin Mituzas wrote:
> 
> hi, all
> I see there are two read in DFSInputStream:
> 
> int read(byte buf[], int off, int len) 
> int read(long position, byte[] buffer, int offset, int length)
> 
> And I use the following code test the read performance. 
> Before test I generate some files in the directory DATA_DIR, then I run
> this function for some time and calculate the read throughput.
> The initFiles() function is borrowed from the patch
> https://issues.apache.org/jira/browse/HDFS-236.
> My question is I tried above two read methods (see the commented lines)
> and found the throughput have huge difference. The results are attached
> below. Is there something wrong with my code ? I cann't believe there can
> be such big difference...
> And in https://issues.apache.org/jira/browse/HDFS-236, I saw the following
> performance data posted by Raghu Angadi :
> 
> Description of read    Time for each read in ms
> 1000 native reads over block files    09.5
> Random Read 10x500    10.8
> Random Read without CRC       10.5
> Random Read with 'seek() and read()'  12.5
> Read with sequential offsets  01.7
> 1000 native reads without closing files       07.5 
> 
> So based on this data, sequential read is about 6x faster than random read
> which is reasonable, and my data seems unreasonable. Anybody provides some
> comments?
> 
> Here is my test result.
> 
> with first read:
> test type,read size,read ops,start time,end time,test time,real read
> time,throughput
> sequence read,64628740096,15778506,[2009-07-20 14:47:01 704],[2009-07-20
> 14:53:41 704],400,400,154.09
> 
> with second read:
> test type,read size,read ops,start time,end time,test time,real read
> time,throughput
> sequence read,2400047104,585949,[2009-07-20 14:59:50 328],[2009-07-20
> 15:06:30 328],400,400,5.72
> 
> My cluster: 1 name node + 3 data nodes, replication = 3. 
> And my code:
> 
> private void sequenceRead(long time) throws IOException {
> 
>         byte[] data = new byte[bufferSize];
>         Random rand = new Random();
>         initFiles(DATA_DIR);
>         long period = time * 1000;
>         FSDataInputStream in = null;
>         long totalSize = 0;
>         long readCount = 0;
>         long offset = 0;
>         int index = (rand.nextInt() & Integer.MAX_VALUE ) %
> fileList.size();
>         if(barrier()){
>                   start = System.currentTimeMillis();
>                   while(System.currentTimeMillis() - start < period){
>                           if(in == null){
>                                   FileInfo file =
> (FileInfo)fileList.get(index);
>                                   in = file.fileStream;
>                                   if(in == null){
>                                           in = fs.open(file.filePath);
>                                           file.fileStream = in;
>                                   }
>                                   index = (index ++) % fileList.size();
>                           }
>                           long actualSize = in.read(offset, data, 0,
> bufferSize);
>                           //long actualSize = in.read(data,0,bufferSize);
>                           readCount ++;
> 
>                           if(actualSize > 0){ 
>                                   totalSize += actualSize;
>                                   offset += actualSize;
>                           }
>                           if(actualSize < bufferSize) {
>                                   //in.seek(0);
>                                   in = null;
>                                   offset = 0;
>                           }
>                   }
>                   out.close();
>                   end = System.currentTimeMillis();
> 
>                   for(FileInfo finfo : fileList){
>                           if(finfo.fileStream != null)
>                                   IOUtils.closeStream(finfo.fileStream);
>                   }
>                   System.out.println("test type,read size,read ops,start
> time,end time,test time,real read time,throughput");
>                   String s = String.format("sequence
> read,%d,%d,[%s],[%s],%d,%d,%.2f",
>                                   totalSize,
>                                   readCount,
>                                   sdf.format(new Date(start)),
>                                   sdf.format(new Date(end)),
>                                   time,
>                                   (end-start)/1000,
>                                   (double)(totalSize*1000)/(double)((end -
> start)*1024*1024));
>                   System.out.println(s);
>         }
>   }
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/HDFS-random-read-performance-vs-sequential-read-performance---tp24565264p24566509.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: HDFS random read performance vs sequential read performance ?

Reply via email to