To simplify, read rate should be faster than write speed.
Raghu.
Raghu Angadi wrote:
Normally, Hadoop read saturates either disk b/w or network b/w on
moderate hardware. So if you have one modern IDE disk and 100mbps
ethernet, you should expect around 10MBps read rate for a simple read
from client on different machine.
Raghu.
j2eeiscool wrote:
Hi Raghu,
Just to give me something to compare with: how long should this file read
(68 megs) take on a good set-up
(client and data node on same network, one hop).
Thanx for your help,
Taj
Raghu Angadi wrote:
Taj,
Even 4 times faster (400 sec for 68MB) is not very fast. First try to
scp a similar sized file between the hosts involved. If this transfer
is slow, first fix this issue. Try to place the test file on the same
partition where HDFS data is stored.
With tcpdump, first make sure amount of data transfered matches
around 68MB that you expect.. and check for any large gaps in data
packets comming to the client. Also when the client is reading, check
netstat on both client and the datanode.. note the send buffer on
datanode and recv buffer on the client. If datanodes send buffer is
non-zero most of the time, then you have some network issue, if recv
buffer on client is full, then client is reading slow for some
reason... etc.
hope this helps.
Raghu.
j2eeiscool wrote:
Hi Raghu,
Good catch, thanx. totalBytesRead is not used for any decision etc.
I ran the client from another m/c and read was about 4 times faster.
I have the tcpdump from the original client m/c.
This is probably asking too much but anything in particular I should be
looking in the tcpdump.
Is (tcpdump) about 16 megs in size.
Thanx,
Taj
Raghu Angadi wrote:
Thats too long.. buffer size does not explain it. Only small
problem I see in your code:
> totalBytesRead += bytesReadThisRead;
> fileNotReadFully = (bytesReadThisRead != -1);
totalBytesRead is off by 1. Not sure where totalBytesRead is used.
If you can, try to check tcpdump on your client machine (for
datanode port 50010)
Raghu.
j2eeiscool wrote:
Hi Raghu,
Many thanx for your reply:
The write takes approximately: 11367 millisecs.
The read takes approximately: 1610565 millisecs.
File size is 68573254 bytes and hdfs block size is 64 megs.