Re: HDFS File Read

j2eeiscool Fri, 16 Nov 2007 12:21:53 -0800

Raghu/Ted,

This turned out to be a sub-optimal network pipe between client and
data-node.


Now the average read time is around 35 secs (for 68 megs ).

On to the next issue:

07/11/16 20:05:37 WARN fs.DFSClient: DFS Read: java.io.IOException:
Blocklist for /hadoopdata0.txt has changed!
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107)
        at java.io.DataInputStream.read(DataInputStream.java:80)
        at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:187)

java.io.IOException: Blocklist for /hadoopdata0.txt has changed!
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107)
        at java.io.DataInputStream.read(DataInputStream.java:80)
        at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:187)
07/11/16 20:05:37 INFO fs.DFSClient: Could not obtain block
blk_1990972671947672118 from any node:  java.io.IOException: No live nodes
contain current block
07/11/16 20:05:40 INFO fs.DFSClient: Could not obtain block
blk_1990972671947672118 from any node:  java.io.IOException: No live nodes
contain current block


This happens during the read.

I get this error from time to time and specially when i run the client in
multithreaded mode.

Could this be an instability on the dataNode side ?

Thanx much,
Taj



Raghu Angadi wrote:
> 
> To simplify, read rate should be faster than write speed.
> 
> Raghu.
> 
> Raghu Angadi wrote:
>> 
>> Normally, Hadoop read saturates either disk b/w or network b/w on 
>> moderate hardware. So if you have one modern IDE disk and 100mbps 
>> ethernet, you should expect around 10MBps read rate for a simple read 
>> from client on different machine.
>> 
>> Raghu.
>> 
>> j2eeiscool wrote:
>>> Hi Raghu,
>>>
>>> Just to give me something to compare with: how long should this file
>>> read
>>> (68 megs) take on a good set-up
>>>
>>> (client and data node on same network, one hop).
>>>
>>> Thanx for your help,
>>> Taj
>>>
>>>
>>>
>>> Raghu Angadi wrote:
>>>> Taj,
>>>>
>>>> Even 4 times faster (400 sec for 68MB) is not very fast. First try to 
>>>> scp a similar sized file between the hosts involved. If this transfer 
>>>> is slow, first fix this issue. Try to place the test file on the same 
>>>> partition where HDFS data is stored.
>>>>
>>>> With tcpdump, first make sure amount of data transfered matches 
>>>> around 68MB that you expect.. and check for any large gaps in data 
>>>> packets comming to the client. Also when the client is reading, check 
>>>> netstat on both client and the datanode.. note the send buffer on 
>>>> datanode and recv buffer on the client. If datanodes send buffer is 
>>>> non-zero most of the time, then you have some network issue, if recv 
>>>> buffer on client is full, then client is reading slow for some 
>>>> reason... etc.
>>>>
>>>> hope this helps.
>>>>
>>>> Raghu.
>>>>
>>>> j2eeiscool wrote:
>>>>> Hi Raghu,
>>>>>
>>>>> Good catch, thanx. totalBytesRead  is not used for any decision etc.
>>>>>
>>>>> I ran the client from another m/c and read was about 4 times faster.
>>>>> I have the tcpdump from the original client m/c.
>>>>> This is probably asking too much but anything in particular I should
>>>>> be
>>>>> looking in the tcpdump.
>>>>>
>>>>> Is (tcpdump) about 16 megs in size.
>>>>>
>>>>> Thanx,
>>>>> Taj
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Raghu Angadi wrote:
>>>>>> Thats too long.. buffer size does not explain it. Only small 
>>>>>> problem I see in your code:
>>>>>>
>>>>>>  > totalBytesRead += bytesReadThisRead;
>>>>>>  > fileNotReadFully = (bytesReadThisRead != -1);
>>>>>>
>>>>>> totalBytesRead is off by 1. Not sure where totalBytesRead is used.
>>>>>>
>>>>>> If you can, try to check tcpdump on your client machine (for 
>>>>>> datanode port 50010)
>>>>>>
>>>>>> Raghu.
>>>>>>
>>>>>> j2eeiscool wrote:
>>>>>>> Hi Raghu,
>>>>>>>
>>>>>>> Many thanx for your reply:
>>>>>>>
>>>>>>> The write takes approximately:  11367 millisecs.
>>>>>>>
>>>>>>> The read takes approximately: 1610565 millisecs.
>>>>>>>
>>>>>>> File size is  68573254 bytes and hdfs block size is 64 megs.
>>>>
>>>
>> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/HDFS-File-Read-tf4773580.html#a13800842
Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: HDFS File Read

Reply via email to