http://pastebin.com/bD3JJ0sD
The logs were 17MB in size max, and variable sizes like that. -Jack On Fri, Sep 24, 2010 at 4:56 PM, Stack <[email protected]> wrote: > Please paste the section from regionserver where you were getting the > EOF to pastebin. I'd like to see exactly where (but yeah, you get the > idea moving the files aside). Check the files too. Are they > zero-length? If so, please look for them in the master log and paste > me the section where we are splitting. > > Thanks Jack, > St.Ack > > > On Fri, Sep 24, 2010 at 4:52 PM, Jack Levin <[email protected]> wrote: >> It was EOF exception, but now that I deleted edits files: >> >> Moved to trash: >> hdfs://namenode-rd.imageshack.us:9000/hbase/img96/1062260343/recovered.edits/0000000000617305532 >> Moved to trash: >> hdfs://namenode-rd.imageshack.us:9000/hbase/img96/1321772129/recovered.edits/0000000000617328530 >> Moved to trash: >> hdfs://namenode-rd.imageshack.us:9000/hbase/img96/257974055/recovered.edits/0000000000617238642 >> Moved to trash: >> hdfs://namenode-rd.imageshack.us:9000/hbase/img97/117679080/recovered.edits/0000000000617306059 >> Moved to trash: >> hdfs://namenode-rd.imageshack.us:9000/hbase/img97/221569766/recovered.edits/0000000000617242019 >> >> Like these. All of the regions have loaded... What could that have >> been? I assume I lost some writes, but this is not a big deal to >> me... question is how to avoid something like that, is that a bug? >> >> -Jack >> >> >> On Fri, Sep 24, 2010 at 4:44 PM, Stack <[email protected]> wrote: >>> What is the complaint in regionserver log when region load fails? >>> St.Ack >>> >>> On Fri, Sep 24, 2010 at 4:40 PM, Jack Levin <[email protected]> wrote: >>>> so, datanode log shows no errors whatsoever, however I do see same >>>> blocks fetched repeatedly, and the network speed is quite high, but I >>>> am unable to load _some_ regions, what could it be? >>>> >>>> 2010-09-24 16:38:42,729 INFO >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >>>> /10.101.6.2:50010, dest: /10.101.6.2:53038, bytes: 914, op: HDFS_READ, >>>> cliID: >>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237, >>>> offset: 13803520, srvID: DS-1363732508-10.101.6.2-50010-1284520709569, >>>> blockid: blk_5556468858269577961_1550101, duration: 127413 >>>> 2010-09-24 16:38:44,317 INFO >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >>>> /10.101.6.2:50010, dest: /10.101.6.2:53048, bytes: 110, op: HDFS_READ, >>>> cliID: >>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237, >>>> offset: 32723968, srvID: DS-1363732508-10.101.6.2-50010-1284520709569, >>>> blockid: blk_364673737339632029_1347910, duration: 1140653 >>>> 2010-09-24 16:38:44,318 INFO >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >>>> /10.101.6.2:50010, dest: /10.101.6.2:53049, bytes: 38294, op: >>>> HDFS_READ, cliID: >>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237, >>>> offset: 32686080, srvID: DS-1363732508-10.101.6.2-50010-1284520709569, >>>> blockid: blk_364673737339632029_1347910, duration: 691929 >>>> 2010-09-24 16:38:44,510 INFO >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >>>> /10.101.6.2:50010, dest: /10.101.6.2:53054, bytes: 18021300, op: >>>> HDFS_READ, cliID: >>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237, >>>> offset: 0, srvID: DS-1363732508-10.101.6.2-50010-1284520709569, >>>> blockid: blk_-3781179144642915580_1571141, duration: 173548261 >>>> 2010-09-24 16:38:44,525 INFO >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >>>> /10.101.6.2:50010, dest: /10.101.6.2:53055, bytes: 506, op: HDFS_READ, >>>> cliID: >>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237, >>>> offset: 48700928, srvID: DS-1363732508-10.101.6.2-50010-1284520709569, >>>> blockid: blk_-176750251227749356_1535293, duration: 77045 >>>> 2010-09-24 16:38:44,526 INFO >>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >>>> /10.101.6.2:50010, dest: /10.101.6.2:53056, bytes: 6182, op: >>>> HDFS_READ, cliID: >>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237, >>>> offset: 48695296, srvID: DS-1363732508-10.101.6.2-50010-1284520709569, >>>> blockid: blk_-176750251227749356_1535293, duration: 128270 >>>> >>>> >>>> >>>> >>>> On Fri, Sep 24, 2010 at 4:32 PM, Stack <[email protected]> wrote: >>>>> (Good one Ryan) >>>>> >>>>> Master is doing the assigning. It needs to be restarted to see the >>>>> config change. >>>>> >>>>> St.Ack >>>>> >>>>> On Fri, Sep 24, 2010 at 4:28 PM, Jack Levin <[email protected]> wrote: >>>>>> Only regionserver, do I need to restart both? >>>>>> >>>>>> -jack >>>>>> >>>>>> On Fri, Sep 24, 2010 at 4:22 PM, Ryan Rawson <[email protected]> wrote: >>>>>>> Did you restart the master and the regionserver? Or just one or the >>>>>>> other? >>>>>>> >>>>>>> -ryan >>>>>>> >>>>>>> On Fri, Sep 24, 2010 at 4:21 PM, Jack Levin <[email protected]> wrote: >>>>>>>> Also, even with '1' value, I see: >>>>>>>> >>>>>>>> 2010-09-24 16:20:29,983 INFO >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: >>>>>>>> img834,1000351n.jpg,1285251664421.d09510a16c0cfd0d8a251a36229125e0. >>>>>>>> 2010-09-24 16:20:29,984 INFO >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: >>>>>>>> img651,pict1408.jpg,1285018965749.110871465 >>>>>>>> 2010-09-24 16:20:29,984 INFO >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: >>>>>>>> img806,sam0084a.jpg,1285324613056.82a1e8ba8d2a37a591a847fb36803c45. >>>>>>>> 2010-09-24 16:20:29,985 INFO >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: >>>>>>>> img535,screenshot1bt.png,1285323376435.fae5f3ab474196c99f10b8a936fb9ead. >>>>>>>> 2010-09-24 16:20:29,985 INFO >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: >>>>>>>> img838,123468.jpg,1285223690165.a2903008621d1a6b6ca02441bf3b68ea. >>>>>>>> 2010-09-24 16:20:29,985 INFO >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: >>>>>>>> img839,yug.jpg,1285230318537.c09323dbaf54130671df2a14d671fe25. >>>>>>>> 2010-09-24 16:20:29,985 INFO >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: >>>>>>>> img821,vlcsnap78737.png,1285283076812.ea4973ce6e43d7f974613c5989647278. >>>>>>>> 2010-09-24 16:20:29,985 INFO >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: >>>>>>>> img805,njt30scbkdmb.gif,1285322429401.f9aacdafd8064bfbcc8cd4f6930febbe. >>>>>>>> 2010-09-24 16:20:29,985 INFO >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: >>>>>>>> img94,img1711m.jpg,1285016850260.1424182007 >>>>>>>> 2010-09-24 16:20:29,986 DEBUG >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion: Creating region >>>>>>>> img840,kitbarca2.png,1285189312696.1ce170ec09384fca51297a5fe7aeb4af. >>>>>>>> >>>>>>>> Which is pretty close to concurrent. >>>>>>>> >>>>>>>> -Jack >>>>>>>> >>>>>>>> On Fri, Sep 24, 2010 at 4:16 PM, Jack Levin <[email protected]> wrote: >>>>>>>>> Still having a problem: >>>>>>>>> >>>>>>>>> 2010-09-24 16:15:02,572 ERROR >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening >>>>>>>>> img695,p1908101232.jpg,1285288492084.d451f05024b42f71a115951c62cdcccf. >>>>>>>>> java.io.EOFException >>>>>>>>> at java.io.DataInputStream.readFully(DataInputStream.java:180) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1937) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1837) >>>>>>>>> >>>>>>>>> >>>>>>>>> I changed the value to '1', and restarted the regionserver... Note >>>>>>>>> that my hdfs is not having a problem. >>>>>>>>> >>>>>>>>> -Jack >>>>>>>>> >>>>>>>>> On Fri, Sep 24, 2010 at 4:01 PM, Stack <[email protected]> wrote: >>>>>>>>>> Try >>>>>>>>>> >>>>>>>>>> <property> >>>>>>>>>> <name>hbase.regions.percheckin</name> >>>>>>>>>> <value>10</value> >>>>>>>>>> <description>Maximum number of regions that can be assigned in a >>>>>>>>>> single go >>>>>>>>>> to a region server. >>>>>>>>>> </description> >>>>>>>>>> </property> >>>>>>>>>> >>>>>>>>>> What do you have now? Whatever it is, go down from there. >>>>>>>>>> >>>>>>>>>> St.Ack >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Sep 24, 2010 at 3:07 PM, Jack Levin <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>>> My regions are 1gb in size and when I cold start the cluster I >>>>>>>>>>> oversaturate my network links (1000 mbps) and get client dfs >>>>>>>>>>> timeouts , anyway to slow the m down? >>>>>>>>>>> >>>>>>>>>>> -Jack >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
