Re: HDFS File Read

j2eeiscool Thu, 08 Nov 2007 15:01:27 -0800

Hi Raghu,

Many thanx for your reply:


The write takes approximately:  11367 millisecs.

The read takes approximately: 1610565 millisecs.

File size is  68573254 bytes and hdfs block size is 64 megs.


Here is the  Writer code:

                        FileInputStream fis = null;
                        OutputStream os = null;
                        try {
                        fis = new FileInputStream(new File(inputFile));
                        os = dsmStore.insert(outputFile);



dsmStore.insert does the following:
{

                DistributedFileSystem fileSystem = new DistributedFileSystem();
                fileSystem.initialize(uri, conf);
        Path path = new Path(sKey);
        //writing:
        FSDataOutputStream dataOutputStream = fileSystem.create(path);
        
        return dataOutputStream;

}                       


                        byte[] data = new byte[4096];
                        while (fis.read(data) != -1) {
                                os.write(data);
                                os.flush();
                        }
                        } catch (Exception e) {
                                e.printStackTrace();
                        }
                        finally {
                                if (os != null) {
                                try {
                                                os.close();
                                        } catch (IOException e) {
                                                // TODO Auto-generated catch 
block
                                                e.printStackTrace();
                                        }                                       
                                }

                                if (fis != null) {
                                try {
                                                fis.close();
                                        } catch (IOException e) {
                                                // TODO Auto-generated catch 
block
                                                e.printStackTrace();
                                        }                                       
                                }
                                
                                
                        }
                }


Here is the  Reader code:


                byte[] data = new byte[4096];
                int totalBytesRead = 0;
                boolean fileNotReadFully = true;
                InputStream is = dsmStore.select(fileName);


dsmStore.select does the following:
{

                DistributedFileSystem fileSystem = new DistributedFileSystem();
                fileSystem.initialize(uri, conf);
                Path path = new Path(sKey);
        FSDataInputStream dataInputStream = fileSystem.open(path);

        return dataInputStream;

}                       


                
                        while (fileNotReadFully) {
                                int bytesReadThisRead = 0 ;
                                try {
                                        bytesReadThisRead = is.read(data);
                                        totalBytesRead += bytesReadThisRead;
                                        fileNotReadFully = (bytesReadThisRead 
!= -1);
                                } catch (Exception e) {
                                        e.printStackTrace();
                                }
                        }
                        if (is != null) {
                                try {
                                        is.close();
                                } catch (IOException e) {
                                        // TODO Auto-generated catch block
                                        e.printStackTrace();
                                }
                        }


Could probably try different buffer sizes etc.

Thanx,
Taj


Raghu Angadi wrote:
> 
> 
> How slow is it? May the code that reads is relevant too.
> 
> Raghu.
> 
> j2eeiscool wrote:
>> Hi,
>> 
>> I am new to hadoop. We are evaluating HDFS for a reliable, disitrbuted
>> file
>> system use.
>> 
>> From the tests (1 name + 1 data, both on different RHEL 4 m/cs, client
>> running on the name node m/c) I have run so far:
>> 
>> 1.The writes are very fast.
>> 
>> 2.The read is very slow (reading a 68 megs file). Here is the sample
>> code.
>> Any ideas what could be going wrong:
>> 
>> 
>>      public InputStream select(String sKey) throws RecordNotFoundException,
>> IOException {
>>              DistributedFileSystem fileSystem = new DistributedFileSystem();
>>              fileSystem.initialize(uri, conf);
>>              Path path = new Path(sKey);
>>         FSDataInputStream dataInputStream = fileSystem.open(path);
>>         return dataInputStream;
>> 
>>      }
>> 
>> Thanx,
>> Taj
>> 
>> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/HDFS-File-Read-tf4773580.html#a13657913
Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: HDFS File Read

Reply via email to