Re: HDFS File Read

Ted Dunning Fri, 16 Nov 2007 13:02:21 -0800

Run hadoop fsck /

It sounds like you have some blocks that have been lost somehow.  This is
pretty easy to do as you reconfigure a new cluster.



On 11/16/07 12:21 PM, "j2eeiscool" <[EMAIL PROTECTED]> wrote:

> 
> Raghu/Ted,
> 
> This turned out to be a sub-optimal network pipe between client and
> data-node.
> 
> Now the average read time is around 35 secs (for 68 megs ).
> 
> On to the next issue:
> 
> 07/11/16 20:05:37 WARN fs.DFSClient: DFS Read: java.io.IOException:
> Blocklist for /hadoopdata0.txt has changed!
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871)
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1
> 161)
>         at
> 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004>
)
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107)
>         at java.io.DataInputStream.read(DataInputStream.java:80)
>         at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:187)
> 
> java.io.IOException: Blocklist for /hadoopdata0.txt has changed!
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871)
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1
> 161)
>         at
> 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004>
)
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107)
>         at java.io.DataInputStream.read(DataInputStream.java:80)
>         at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:187)
> 07/11/16 20:05:37 INFO fs.DFSClient: Could not obtain block
> blk_1990972671947672118 from any node:  java.io.IOException: No live nodes
> contain current block
> 07/11/16 20:05:40 INFO fs.DFSClient: Could not obtain block
> blk_1990972671947672118 from any node:  java.io.IOException: No live nodes
> contain current block
> 
> 
> This happens during the read.
> 
> I get this error from time to time and specially when i run the client in
> multithreaded mode.
> 
> Could this be an instability on the dataNode side ?
> 
> Thanx much,
> Taj
> 
> 
> 
> Raghu Angadi wrote:
>> 
>> To simplify, read rate should be faster than write speed.
>> 
>> Raghu.
>> 
>> Raghu Angadi wrote:
>>> 
>>> Normally, Hadoop read saturates either disk b/w or network b/w on
>>> moderate hardware. So if you have one modern IDE disk and 100mbps
>>> ethernet, you should expect around 10MBps read rate for a simple read
>>> from client on different machine.
>>> 
>>> Raghu.
>>> 
>>> j2eeiscool wrote:
>>>> Hi Raghu,
>>>> 
>>>> Just to give me something to compare with: how long should this file
>>>> read
>>>> (68 megs) take on a good set-up
>>>> 
>>>> (client and data node on same network, one hop).
>>>> 
>>>> Thanx for your help,
>>>> Taj
>>>> 
>>>> 
>>>> 
>>>> Raghu Angadi wrote:
>>>>> Taj,
>>>>> 
>>>>> Even 4 times faster (400 sec for 68MB) is not very fast. First try to
>>>>> scp a similar sized file between the hosts involved. If this transfer
>>>>> is slow, first fix this issue. Try to place the test file on the same
>>>>> partition where HDFS data is stored.
>>>>> 
>>>>> With tcpdump, first make sure amount of data transfered matches
>>>>> around 68MB that you expect.. and check for any large gaps in data
>>>>> packets comming to the client. Also when the client is reading, check
>>>>> netstat on both client and the datanode.. note the send buffer on
>>>>> datanode and recv buffer on the client. If datanodes send buffer is
>>>>> non-zero most of the time, then you have some network issue, if recv
>>>>> buffer on client is full, then client is reading slow for some
>>>>> reason... etc.
>>>>> 
>>>>> hope this helps.
>>>>> 
>>>>> Raghu.
>>>>> 
>>>>> j2eeiscool wrote:
>>>>>> Hi Raghu,
>>>>>> 
>>>>>> Good catch, thanx. totalBytesRead  is not used for any decision etc.
>>>>>> 
>>>>>> I ran the client from another m/c and read was about 4 times faster.
>>>>>> I have the tcpdump from the original client m/c.
>>>>>> This is probably asking too much but anything in particular I should
>>>>>> be
>>>>>> looking in the tcpdump.
>>>>>> 
>>>>>> Is (tcpdump) about 16 megs in size.
>>>>>> 
>>>>>> Thanx,
>>>>>> Taj
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Raghu Angadi wrote:
>>>>>>> Thats too long.. buffer size does not explain it. Only small
>>>>>>> problem I see in your code:
>>>>>>> 
>>>>>>>> totalBytesRead += bytesReadThisRead;
>>>>>>>> fileNotReadFully = (bytesReadThisRead != -1);
>>>>>>> 
>>>>>>> totalBytesRead is off by 1. Not sure where totalBytesRead is used.
>>>>>>> 
>>>>>>> If you can, try to check tcpdump on your client machine (for
>>>>>>> datanode port 50010)
>>>>>>> 
>>>>>>> Raghu.
>>>>>>> 
>>>>>>> j2eeiscool wrote:
>>>>>>>> Hi Raghu,
>>>>>>>> 
>>>>>>>> Many thanx for your reply:
>>>>>>>> 
>>>>>>>> The write takes approximately:  11367 millisecs.
>>>>>>>> 
>>>>>>>> The read takes approximately: 1610565 millisecs.
>>>>>>>> 
>>>>>>>> File size is  68573254 bytes and hdfs block size is 64 megs.
>>>>> 
>>>> 
>>> 
>> 
>> 
>>

Re: HDFS File Read

Reply via email to