Re: HDFS File Read

j2eeiscool Mon, 12 Nov 2007 13:53:40 -0800

Hi Raghu,

Good catch, thanx. totalBytesRead  is not used for any decision etc.


I ran the client from another m/c and read was about 4 times faster. 

I have the tcpdump from the original client m/c.
This is probably asking too much but anything in particular I should be
looking in the tcpdump.

Is (tcpdump) about 16 megs in size.

Thanx,
Taj






Raghu Angadi wrote:
> 
> 
> Thats too long.. buffer size does not explain it. Only small problem I 
> see in your code:
> 
>  > totalBytesRead += bytesReadThisRead;
>  > fileNotReadFully = (bytesReadThisRead != -1);
> 
> totalBytesRead is off by 1. Not sure where totalBytesRead is used.
> 
> If you can, try to check tcpdump on your client machine (for datanode 
> port 50010)
> 
> Raghu.
> 
> j2eeiscool wrote:
>> Hi Raghu,
>> 
>> Many thanx for your reply:
>> 
>> The write takes approximately:  11367 millisecs.
>> 
>> The read takes approximately: 1610565 millisecs.
>> 
>> File size is  68573254 bytes and hdfs block size is 64 megs.
>> 
>> 
>> Here is the  Writer code:
>> 
>>                      FileInputStream fis = null;
>>                      OutputStream os = null;
>>                      try {
>>                      fis = new FileInputStream(new File(inputFile));
>>                      os = dsmStore.insert(outputFile);
>> 
>> 
>> 
>> dsmStore.insert does the following:
>> {
>> 
>>              DistributedFileSystem fileSystem = new DistributedFileSystem();
>>              fileSystem.initialize(uri, conf);
>>         Path path = new Path(sKey);
>>         //writing:
>>         FSDataOutputStream dataOutputStream = fileSystem.create(path);
>>         
>>         return dataOutputStream;
>> 
>> }                    
>> 
>> 
>>                      byte[] data = new byte[4096];
>>                      while (fis.read(data) != -1) {
>>                              os.write(data);
>>                              os.flush();
>>                      }
>>                      } catch (Exception e) {
>>                              e.printStackTrace();
>>                      }
>>                      finally {
>>                              if (os != null) {
>>                              try {
>>                                              os.close();
>>                                      } catch (IOException e) {
>>                                              // TODO Auto-generated catch 
>> block
>>                                              e.printStackTrace();
>>                                      }                                       
>>                              }
>> 
>>                              if (fis != null) {
>>                              try {
>>                                              fis.close();
>>                                      } catch (IOException e) {
>>                                              // TODO Auto-generated catch 
>> block
>>                                              e.printStackTrace();
>>                                      }                                       
>>                              }
>>                              
>>                              
>>                      }
>>              }
>> 
>> 
>> Here is the  Reader code:
>> 
>> 
>>              byte[] data = new byte[4096];
>>              int totalBytesRead = 0;
>>              boolean fileNotReadFully = true;
>>              InputStream is = dsmStore.select(fileName);
>> 
>> 
>> dsmStore.select does the following:
>> {
>> 
>>              DistributedFileSystem fileSystem = new DistributedFileSystem();
>>              fileSystem.initialize(uri, conf);
>>              Path path = new Path(sKey);
>>         FSDataInputStream dataInputStream = fileSystem.open(path);
>> 
>>         return dataInputStream;
>> 
>> }                    
>> 
>> 
>>              
>>                      while (fileNotReadFully) {
>>                                 int bytesReadThisRead = 0 ;
>>                              try {
>>                                      bytesReadThisRead = is.read(data);
>>                                      totalBytesRead += bytesReadThisRead;
>>                                      fileNotReadFully = (bytesReadThisRead 
>> != -1);
>>                              } catch (Exception e) {
>>                                      e.printStackTrace();
>>                              }
>>                      }
>>                      if (is != null) {
>>                              try {
>>                                      is.close();
>>                              } catch (IOException e) {
>>                                      // TODO Auto-generated catch block
>>                                      e.printStackTrace();
>>                              }
>>                      }
>> 
>> 
>> Could probably try different buffer sizes etc.
>> 
>> Thanx,
>> Taj
>> 
>> 
>> Raghu Angadi wrote:
>>>
>>> How slow is it? May the code that reads is relevant too.
>>>
>>> Raghu.
>>>
>>> j2eeiscool wrote:
>>>> Hi,
>>>>
>>>> I am new to hadoop. We are evaluating HDFS for a reliable, disitrbuted
>>>> file
>>>> system use.
>>>>
>>>> From the tests (1 name + 1 data, both on different RHEL 4 m/cs, client
>>>> running on the name node m/c) I have run so far:
>>>>
>>>> 1.The writes are very fast.
>>>>
>>>> 2.The read is very slow (reading a 68 megs file). Here is the sample
>>>> code.
>>>> Any ideas what could be going wrong:
>>>>
>>>>
>>>>    public InputStream select(String sKey) throws RecordNotFoundException,
>>>> IOException {
>>>>            DistributedFileSystem fileSystem = new DistributedFileSystem();
>>>>            fileSystem.initialize(uri, conf);
>>>>            Path path = new Path(sKey);
>>>>         FSDataInputStream dataInputStream = fileSystem.open(path);
>>>>         return dataInputStream;
>>>>
>>>>    }
>>>>
>>>> Thanx,
>>>> Taj
>>>>
>>>>
>>>
>>>
>> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/HDFS-File-Read-tf4773580.html#a13715077
Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: HDFS File Read

Reply via email to