Run hadoop fsck / It sounds like you have some blocks that have been lost somehow. This is pretty easy to do as you reconfigure a new cluster.
On 11/16/07 12:21 PM, "j2eeiscool" <[EMAIL PROTECTED]> wrote: > > Raghu/Ted, > > This turned out to be a sub-optimal network pipe between client and > data-node. > > Now the average read time is around 35 secs (for 68 megs ). > > On to the next issue: > > 07/11/16 20:05:37 WARN fs.DFSClient: DFS Read: java.io.IOException: > Blocklist for /hadoopdata0.txt has changed! > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1 > 161) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004> ) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107) > at java.io.DataInputStream.read(DataInputStream.java:80) > at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:187) > > java.io.IOException: Blocklist for /hadoopdata0.txt has changed! > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1 > 161) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004> ) > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107) > at java.io.DataInputStream.read(DataInputStream.java:80) > at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:187) > 07/11/16 20:05:37 INFO fs.DFSClient: Could not obtain block > blk_1990972671947672118 from any node: java.io.IOException: No live nodes > contain current block > 07/11/16 20:05:40 INFO fs.DFSClient: Could not obtain block > blk_1990972671947672118 from any node: java.io.IOException: No live nodes > contain current block > > > This happens during the read. > > I get this error from time to time and specially when i run the client in > multithreaded mode. > > Could this be an instability on the dataNode side ? > > Thanx much, > Taj > > > > Raghu Angadi wrote: >> >> To simplify, read rate should be faster than write speed. >> >> Raghu. >> >> Raghu Angadi wrote: >>> >>> Normally, Hadoop read saturates either disk b/w or network b/w on >>> moderate hardware. So if you have one modern IDE disk and 100mbps >>> ethernet, you should expect around 10MBps read rate for a simple read >>> from client on different machine. >>> >>> Raghu. >>> >>> j2eeiscool wrote: >>>> Hi Raghu, >>>> >>>> Just to give me something to compare with: how long should this file >>>> read >>>> (68 megs) take on a good set-up >>>> >>>> (client and data node on same network, one hop). >>>> >>>> Thanx for your help, >>>> Taj >>>> >>>> >>>> >>>> Raghu Angadi wrote: >>>>> Taj, >>>>> >>>>> Even 4 times faster (400 sec for 68MB) is not very fast. First try to >>>>> scp a similar sized file between the hosts involved. If this transfer >>>>> is slow, first fix this issue. Try to place the test file on the same >>>>> partition where HDFS data is stored. >>>>> >>>>> With tcpdump, first make sure amount of data transfered matches >>>>> around 68MB that you expect.. and check for any large gaps in data >>>>> packets comming to the client. Also when the client is reading, check >>>>> netstat on both client and the datanode.. note the send buffer on >>>>> datanode and recv buffer on the client. If datanodes send buffer is >>>>> non-zero most of the time, then you have some network issue, if recv >>>>> buffer on client is full, then client is reading slow for some >>>>> reason... etc. >>>>> >>>>> hope this helps. >>>>> >>>>> Raghu. >>>>> >>>>> j2eeiscool wrote: >>>>>> Hi Raghu, >>>>>> >>>>>> Good catch, thanx. totalBytesRead is not used for any decision etc. >>>>>> >>>>>> I ran the client from another m/c and read was about 4 times faster. >>>>>> I have the tcpdump from the original client m/c. >>>>>> This is probably asking too much but anything in particular I should >>>>>> be >>>>>> looking in the tcpdump. >>>>>> >>>>>> Is (tcpdump) about 16 megs in size. >>>>>> >>>>>> Thanx, >>>>>> Taj >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Raghu Angadi wrote: >>>>>>> Thats too long.. buffer size does not explain it. Only small >>>>>>> problem I see in your code: >>>>>>> >>>>>>>> totalBytesRead += bytesReadThisRead; >>>>>>>> fileNotReadFully = (bytesReadThisRead != -1); >>>>>>> >>>>>>> totalBytesRead is off by 1. Not sure where totalBytesRead is used. >>>>>>> >>>>>>> If you can, try to check tcpdump on your client machine (for >>>>>>> datanode port 50010) >>>>>>> >>>>>>> Raghu. >>>>>>> >>>>>>> j2eeiscool wrote: >>>>>>>> Hi Raghu, >>>>>>>> >>>>>>>> Many thanx for your reply: >>>>>>>> >>>>>>>> The write takes approximately: 11367 millisecs. >>>>>>>> >>>>>>>> The read takes approximately: 1610565 millisecs. >>>>>>>> >>>>>>>> File size is 68573254 bytes and hdfs block size is 64 megs. >>>>> >>>> >>> >> >> >>