Raghu/Ted,
This turned out to be a sub-optimal network pipe between client and
data-node.
Now the average read time is around 35 secs (for 68 megs ).
On to the next issue:
07/11/16 20:05:37 WARN fs.DFSClient: DFS Read: java.io.IOException:
Blocklist for /hadoopdata0.txt has changed!
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107)
at java.io.DataInputStream.read(DataInputStream.java:80)
at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:187)
java.io.IOException: Blocklist for /hadoopdata0.txt has changed!
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107)
at java.io.DataInputStream.read(DataInputStream.java:80)
at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:187)
07/11/16 20:05:37 INFO fs.DFSClient: Could not obtain block
blk_1990972671947672118 from any node: java.io.IOException: No live nodes
contain current block
07/11/16 20:05:40 INFO fs.DFSClient: Could not obtain block
blk_1990972671947672118 from any node: java.io.IOException: No live nodes
contain current block
This happens during the read.
I get this error from time to time and specially when i run the client in
multithreaded mode.
Could this be an instability on the dataNode side ?
Thanx much,
Taj
Raghu Angadi wrote:
>
> To simplify, read rate should be faster than write speed.
>
> Raghu.
>
> Raghu Angadi wrote:
>>
>> Normally, Hadoop read saturates either disk b/w or network b/w on
>> moderate hardware. So if you have one modern IDE disk and 100mbps
>> ethernet, you should expect around 10MBps read rate for a simple read
>> from client on different machine.
>>
>> Raghu.
>>
>> j2eeiscool wrote:
>>> Hi Raghu,
>>>
>>> Just to give me something to compare with: how long should this file
>>> read
>>> (68 megs) take on a good set-up
>>>
>>> (client and data node on same network, one hop).
>>>
>>> Thanx for your help,
>>> Taj
>>>
>>>
>>>
>>> Raghu Angadi wrote:
>>>> Taj,
>>>>
>>>> Even 4 times faster (400 sec for 68MB) is not very fast. First try to
>>>> scp a similar sized file between the hosts involved. If this transfer
>>>> is slow, first fix this issue. Try to place the test file on the same
>>>> partition where HDFS data is stored.
>>>>
>>>> With tcpdump, first make sure amount of data transfered matches
>>>> around 68MB that you expect.. and check for any large gaps in data
>>>> packets comming to the client. Also when the client is reading, check
>>>> netstat on both client and the datanode.. note the send buffer on
>>>> datanode and recv buffer on the client. If datanodes send buffer is
>>>> non-zero most of the time, then you have some network issue, if recv
>>>> buffer on client is full, then client is reading slow for some
>>>> reason... etc.
>>>>
>>>> hope this helps.
>>>>
>>>> Raghu.
>>>>
>>>> j2eeiscool wrote:
>>>>> Hi Raghu,
>>>>>
>>>>> Good catch, thanx. totalBytesRead is not used for any decision etc.
>>>>>
>>>>> I ran the client from another m/c and read was about 4 times faster.
>>>>> I have the tcpdump from the original client m/c.
>>>>> This is probably asking too much but anything in particular I should
>>>>> be
>>>>> looking in the tcpdump.
>>>>>
>>>>> Is (tcpdump) about 16 megs in size.
>>>>>
>>>>> Thanx,
>>>>> Taj
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Raghu Angadi wrote:
>>>>>> Thats too long.. buffer size does not explain it. Only small
>>>>>> problem I see in your code:
>>>>>>
>>>>>> > totalBytesRead += bytesReadThisRead;
>>>>>> > fileNotReadFully = (bytesReadThisRead != -1);
>>>>>>
>>>>>> totalBytesRead is off by 1. Not sure where totalBytesRead is used.
>>>>>>
>>>>>> If you can, try to check tcpdump on your client machine (for
>>>>>> datanode port 50010)
>>>>>>
>>>>>> Raghu.
>>>>>>
>>>>>> j2eeiscool wrote:
>>>>>>> Hi Raghu,
>>>>>>>
>>>>>>> Many thanx for your reply:
>>>>>>>
>>>>>>> The write takes approximately: 11367 millisecs.
>>>>>>>
>>>>>>> The read takes approximately: 1610565 millisecs.
>>>>>>>
>>>>>>> File size is 68573254 bytes and hdfs block size is 64 megs.
>>>>
>>>
>>
>
>
>
--
View this message in context:
http://www.nabble.com/HDFS-File-Read-tf4773580.html#a13800842
Sent from the Hadoop Users mailing list archive at Nabble.com.