Re: HDFS File Read

Raghu Angadi Thu, 08 Nov 2007 15:48:31 -0800

Thats too long.. buffer size does not explain it. Only small problem Isee in your code:


> totalBytesRead += bytesReadThisRead;
> fileNotReadFully = (bytesReadThisRead != -1);

totalBytesRead is off by 1. Not sure where totalBytesRead is used.

If you can, try to check tcpdump on your client machine (for datanodeport 50010)


Raghu.

j2eeiscool wrote:

Hi Raghu,

Many thanx for your reply:

The write takes approximately:  11367 millisecs.

The read takes approximately: 1610565 millisecs.

File size is  68573254 bytes and hdfs block size is 64 megs.


Here is the  Writer code:

                        FileInputStream fis = null;
                        OutputStream os = null;
                        try {
                        fis = new FileInputStream(new File(inputFile));
                        os = dsmStore.insert(outputFile);



dsmStore.insert does the following:
{

                DistributedFileSystem fileSystem = new DistributedFileSystem();
                fileSystem.initialize(uri, conf);
        Path path = new Path(sKey);
        //writing:
        FSDataOutputStream dataOutputStream = fileSystem.create(path);

return dataOutputStream;

}


                        byte[] data = new byte[4096];
                        while (fis.read(data) != -1) {
                                os.write(data);
                                os.flush();
                        }
                        } catch (Exception e) {
                                e.printStackTrace();
                        }
                        finally {
                                if (os != null) {
                                try {
                                                os.close();
                                        } catch (IOException e) {
                                                // TODO Auto-generated catch 
block
                                                e.printStackTrace();
                                        }                                       
                                }

                                if (fis != null) {
                                try {
                                                fis.close();
                                        } catch (IOException e) {
                                                // TODO Auto-generated catch 
block
                                                e.printStackTrace();
                                        }                                       
                                }
                                
                                
                        }
                }


Here is the  Reader code:


                byte[] data = new byte[4096];
                int totalBytesRead = 0;
                boolean fileNotReadFully = true;
                InputStream is = dsmStore.select(fileName);


dsmStore.select does the following:
{

                DistributedFileSystem fileSystem = new DistributedFileSystem();
                fileSystem.initialize(uri, conf);
                Path path = new Path(sKey);
        FSDataInputStream dataInputStream = fileSystem.open(path);

        return dataInputStream;

}

while (fileNotReadFully) {

                                int bytesReadThisRead = 0 ;
                                try {
                                        bytesReadThisRead = is.read(data);
                                        totalBytesRead += bytesReadThisRead;
                                        fileNotReadFully = (bytesReadThisRead 
!= -1);
                                } catch (Exception e) {
                                        e.printStackTrace();
                                }
                        }
                        if (is != null) {
                                try {
                                        is.close();
                                } catch (IOException e) {
                                        // TODO Auto-generated catch block
                                        e.printStackTrace();
                                }
                        }


Could probably try different buffer sizes etc.

Thanx,
Taj


Raghu Angadi wrote:


How slow is it? May the code that reads is relevant too.

Raghu.

j2eeiscool wrote:

Hi,

I am new to hadoop. We are evaluating HDFS for a reliable, disitrbuted
file
system use.

From the tests (1 name + 1 data, both on different RHEL 4 m/cs, client
running on the name node m/c) I have run so far:

1.The writes are very fast.

2.The read is very slow (reading a 68 megs file). Here is the sample
code.
Any ideas what could be going wrong:


        public InputStream select(String sKey) throws RecordNotFoundException,
IOException {
                DistributedFileSystem fileSystem = new DistributedFileSystem();
                fileSystem.initialize(uri, conf);
                Path path = new Path(sKey);
        FSDataInputStream dataInputStream = fileSystem.open(path);
        return dataInputStream;

        }

Thanx,
Taj

Re: HDFS File Read

Reply via email to