Raghu/Ted, This turned out to be a sub-optimal network pipe between client and data-node.
Now the average read time is around 35 secs (for 68 megs ). On to the next issue: 07/11/16 20:05:37 WARN fs.DFSClient: DFS Read: java.io.IOException: Blocklist for /hadoopdata0.txt has changed! at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107) at java.io.DataInputStream.read(DataInputStream.java:80) at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:187) java.io.IOException: Blocklist for /hadoopdata0.txt has changed! at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107) at java.io.DataInputStream.read(DataInputStream.java:80) at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:187) 07/11/16 20:05:37 INFO fs.DFSClient: Could not obtain block blk_1990972671947672118 from any node: java.io.IOException: No live nodes contain current block 07/11/16 20:05:40 INFO fs.DFSClient: Could not obtain block blk_1990972671947672118 from any node: java.io.IOException: No live nodes contain current block This happens during the read. I get this error from time to time and specially when i run the client in multithreaded mode. Could this be an instability on the dataNode side ? Thanx much, Taj Raghu Angadi wrote: > > To simplify, read rate should be faster than write speed. > > Raghu. > > Raghu Angadi wrote: >> >> Normally, Hadoop read saturates either disk b/w or network b/w on >> moderate hardware. So if you have one modern IDE disk and 100mbps >> ethernet, you should expect around 10MBps read rate for a simple read >> from client on different machine. >> >> Raghu. >> >> j2eeiscool wrote: >>> Hi Raghu, >>> >>> Just to give me something to compare with: how long should this file >>> read >>> (68 megs) take on a good set-up >>> >>> (client and data node on same network, one hop). >>> >>> Thanx for your help, >>> Taj >>> >>> >>> >>> Raghu Angadi wrote: >>>> Taj, >>>> >>>> Even 4 times faster (400 sec for 68MB) is not very fast. First try to >>>> scp a similar sized file between the hosts involved. If this transfer >>>> is slow, first fix this issue. Try to place the test file on the same >>>> partition where HDFS data is stored. >>>> >>>> With tcpdump, first make sure amount of data transfered matches >>>> around 68MB that you expect.. and check for any large gaps in data >>>> packets comming to the client. Also when the client is reading, check >>>> netstat on both client and the datanode.. note the send buffer on >>>> datanode and recv buffer on the client. If datanodes send buffer is >>>> non-zero most of the time, then you have some network issue, if recv >>>> buffer on client is full, then client is reading slow for some >>>> reason... etc. >>>> >>>> hope this helps. >>>> >>>> Raghu. >>>> >>>> j2eeiscool wrote: >>>>> Hi Raghu, >>>>> >>>>> Good catch, thanx. totalBytesRead is not used for any decision etc. >>>>> >>>>> I ran the client from another m/c and read was about 4 times faster. >>>>> I have the tcpdump from the original client m/c. >>>>> This is probably asking too much but anything in particular I should >>>>> be >>>>> looking in the tcpdump. >>>>> >>>>> Is (tcpdump) about 16 megs in size. >>>>> >>>>> Thanx, >>>>> Taj >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Raghu Angadi wrote: >>>>>> Thats too long.. buffer size does not explain it. Only small >>>>>> problem I see in your code: >>>>>> >>>>>> > totalBytesRead += bytesReadThisRead; >>>>>> > fileNotReadFully = (bytesReadThisRead != -1); >>>>>> >>>>>> totalBytesRead is off by 1. Not sure where totalBytesRead is used. >>>>>> >>>>>> If you can, try to check tcpdump on your client machine (for >>>>>> datanode port 50010) >>>>>> >>>>>> Raghu. >>>>>> >>>>>> j2eeiscool wrote: >>>>>>> Hi Raghu, >>>>>>> >>>>>>> Many thanx for your reply: >>>>>>> >>>>>>> The write takes approximately: 11367 millisecs. >>>>>>> >>>>>>> The read takes approximately: 1610565 millisecs. >>>>>>> >>>>>>> File size is 68573254 bytes and hdfs block size is 64 megs. >>>> >>> >> > > > -- View this message in context: http://www.nabble.com/HDFS-File-Read-tf4773580.html#a13800842 Sent from the Hadoop Users mailing list archive at Nabble.com.