Normally, Hadoop read saturates either disk b/w or network b/w on moderate hardware. So if you have one modern IDE disk and 100mbps ethernet, you should expect around 10MBps read rate for a simple read from client on different machine.

Raghu.

j2eeiscool wrote:
Hi Raghu,

Just to give me something to compare with: how long should this file read
(68 megs) take on a good set-up

(client and data node on same network, one hop).

Thanx for your help,
Taj



Raghu Angadi wrote:
Taj,

Even 4 times faster (400 sec for 68MB) is not very fast. First try to scp a similar sized file between the hosts involved. If this transfer is slow, first fix this issue. Try to place the test file on the same partition where HDFS data is stored.

With tcpdump, first make sure amount of data transfered matches around 68MB that you expect.. and check for any large gaps in data packets comming to the client. Also when the client is reading, check netstat on both client and the datanode.. note the send buffer on datanode and recv buffer on the client. If datanodes send buffer is non-zero most of the time, then you have some network issue, if recv buffer on client is full, then client is reading slow for some reason... etc.

hope this helps.

Raghu.

j2eeiscool wrote:
Hi Raghu,

Good catch, thanx. totalBytesRead  is not used for any decision etc.

I ran the client from another m/c and read was about 4 times faster.
I have the tcpdump from the original client m/c.
This is probably asking too much but anything in particular I should be
looking in the tcpdump.

Is (tcpdump) about 16 megs in size.

Thanx,
Taj






Raghu Angadi wrote:
Thats too long.. buffer size does not explain it. Only small problem I see in your code:

 > totalBytesRead += bytesReadThisRead;
 > fileNotReadFully = (bytesReadThisRead != -1);

totalBytesRead is off by 1. Not sure where totalBytesRead is used.

If you can, try to check tcpdump on your client machine (for datanode port 50010)

Raghu.

j2eeiscool wrote:
Hi Raghu,

Many thanx for your reply:

The write takes approximately:  11367 millisecs.

The read takes approximately: 1610565 millisecs.

File size is  68573254 bytes and hdfs block size is 64 megs.



Reply via email to