Re: Regions loading too fast

Stack Tue, 28 Sep 2010 20:32:52 -0700

Jack:  You might want to try applying hbase-3038 (there's two patches
up there.  you'll need both).  Thought is that it might be cause of
the EOFE you were running into (even though your files seemed less
than the 2G that hbase-3038 is about).
St.Ack


On Tue, Sep 28, 2010 at 12:12 PM, Stack <[email protected]> wrote:
> I made https://issues.apache.org/jira/browse/HBASE-3046 for looking
> into this.  We though we're repro'd it here but it seems like we were
> running into hbase-3038... which was not your case, at least, not for
> the two files you made available to me.
>
> St.Ack
>
> On Fri, Sep 24, 2010 at 4:52 PM, Jack Levin <[email protected]> wrote:
>> It was EOF exception, but now that I deleted edits files:
>>
>> Moved to trash:
>> hdfs://namenode-rd.imageshack.us:9000/hbase/img96/1062260343/recovered.edits/0000000000617305532
>> Moved to trash:
>> hdfs://namenode-rd.imageshack.us:9000/hbase/img96/1321772129/recovered.edits/0000000000617328530
>> Moved to trash:
>> hdfs://namenode-rd.imageshack.us:9000/hbase/img96/257974055/recovered.edits/0000000000617238642
>> Moved to trash:
>> hdfs://namenode-rd.imageshack.us:9000/hbase/img97/117679080/recovered.edits/0000000000617306059
>> Moved to trash:
>> hdfs://namenode-rd.imageshack.us:9000/hbase/img97/221569766/recovered.edits/0000000000617242019
>>
>> Like these.  All of the regions have loaded... What could that have
>> been?   I assume I lost some writes, but this is not a big deal to
>> me... question is how to avoid something like that, is that a bug?
>>
>> -Jack
>>
>>
>> On Fri, Sep 24, 2010 at 4:44 PM, Stack <[email protected]> wrote:
>>> What is the complaint in regionserver log when region load fails?
>>> St.Ack
>>>
>>> On Fri, Sep 24, 2010 at 4:40 PM, Jack Levin <[email protected]> wrote:
>>>> so, datanode log shows no errors whatsoever, however I do see same
>>>> blocks fetched repeatedly, and the network speed is quite high, but I
>>>> am unable to load _some_ regions, what could it be?
>>>>
>>>> 2010-09-24 16:38:42,729 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>>> /10.101.6.2:50010, dest: /10.101.6.2:53038, bytes: 914, op: HDFS_READ,
>>>> cliID: 
>>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237,
>>>> offset: 13803520, srvID: DS-1363732508-10.101.6.2-50010-1284520709569,
>>>> blockid: blk_5556468858269577961_1550101, duration: 127413
>>>> 2010-09-24 16:38:44,317 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>>> /10.101.6.2:50010, dest: /10.101.6.2:53048, bytes: 110, op: HDFS_READ,
>>>> cliID: 
>>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237,
>>>> offset: 32723968, srvID: DS-1363732508-10.101.6.2-50010-1284520709569,
>>>> blockid: blk_364673737339632029_1347910, duration: 1140653
>>>> 2010-09-24 16:38:44,318 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>>> /10.101.6.2:50010, dest: /10.101.6.2:53049, bytes: 38294, op:
>>>> HDFS_READ, cliID:
>>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237,
>>>> offset: 32686080, srvID: DS-1363732508-10.101.6.2-50010-1284520709569,
>>>> blockid: blk_364673737339632029_1347910, duration: 691929
>>>> 2010-09-24 16:38:44,510 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>>> /10.101.6.2:50010, dest: /10.101.6.2:53054, bytes: 18021300, op:
>>>> HDFS_READ, cliID:
>>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237,
>>>> offset: 0, srvID: DS-1363732508-10.101.6.2-50010-1284520709569,
>>>> blockid: blk_-3781179144642915580_1571141, duration: 173548261
>>>> 2010-09-24 16:38:44,525 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>>> /10.101.6.2:50010, dest: /10.101.6.2:53055, bytes: 506, op: HDFS_READ,
>>>> cliID: 
>>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237,
>>>> offset: 48700928, srvID: DS-1363732508-10.101.6.2-50010-1284520709569,
>>>> blockid: blk_-176750251227749356_1535293, duration: 77045
>>>> 2010-09-24 16:38:44,526 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>>> /10.101.6.2:50010, dest: /10.101.6.2:53056, bytes: 6182, op:
>>>> HDFS_READ, cliID:
>>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237,
>>>> offset: 48695296, srvID: DS-1363732508-10.101.6.2-50010-1284520709569,
>>>> blockid: blk_-176750251227749356_1535293, duration: 128270
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Sep 24, 2010 at 4:32 PM, Stack <[email protected]> wrote:
>>>>> (Good one Ryan)
>>>>>
>>>>> Master is doing the assigning.  It needs to be restarted to see the
>>>>> config change.
>>>>>
>>>>> St.Ack
>>>>>
>>>>> On Fri, Sep 24, 2010 at 4:28 PM, Jack Levin <[email protected]> wrote:
>>>>>> Only regionserver, do I need to restart both?
>>>>>>
>>>>>> -jack
>>>>>>
>>>>>> On Fri, Sep 24, 2010 at 4:22 PM, Ryan Rawson <[email protected]> wrote:
>>>>>>> Did you restart the master and the regionserver? Or just one or the 
>>>>>>> other?
>>>>>>>
>>>>>>> -ryan
>>>>>>>
>>>>>>> On Fri, Sep 24, 2010 at 4:21 PM, Jack Levin <[email protected]> wrote:
>>>>>>>> Also, even with '1' value, I see:
>>>>>>>>
>>>>>>>> 2010-09-24 16:20:29,983 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img834,1000351n.jpg,1285251664421.d09510a16c0cfd0d8a251a36229125e0.
>>>>>>>> 2010-09-24 16:20:29,984 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img651,pict1408.jpg,1285018965749.110871465
>>>>>>>> 2010-09-24 16:20:29,984 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img806,sam0084a.jpg,1285324613056.82a1e8ba8d2a37a591a847fb36803c45.
>>>>>>>> 2010-09-24 16:20:29,985 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img535,screenshot1bt.png,1285323376435.fae5f3ab474196c99f10b8a936fb9ead.
>>>>>>>> 2010-09-24 16:20:29,985 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img838,123468.jpg,1285223690165.a2903008621d1a6b6ca02441bf3b68ea.
>>>>>>>> 2010-09-24 16:20:29,985 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img839,yug.jpg,1285230318537.c09323dbaf54130671df2a14d671fe25.
>>>>>>>> 2010-09-24 16:20:29,985 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img821,vlcsnap78737.png,1285283076812.ea4973ce6e43d7f974613c5989647278.
>>>>>>>> 2010-09-24 16:20:29,985 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img805,njt30scbkdmb.gif,1285322429401.f9aacdafd8064bfbcc8cd4f6930febbe.
>>>>>>>> 2010-09-24 16:20:29,985 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img94,img1711m.jpg,1285016850260.1424182007
>>>>>>>> 2010-09-24 16:20:29,986 DEBUG
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion: Creating region
>>>>>>>> img840,kitbarca2.png,1285189312696.1ce170ec09384fca51297a5fe7aeb4af.
>>>>>>>>
>>>>>>>> Which is pretty close to concurrent.
>>>>>>>>
>>>>>>>> -Jack
>>>>>>>>
>>>>>>>> On Fri, Sep 24, 2010 at 4:16 PM, Jack Levin <[email protected]> wrote:
>>>>>>>>> Still having a problem:
>>>>>>>>>
>>>>>>>>> 2010-09-24 16:15:02,572 ERROR
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening
>>>>>>>>> img695,p1908101232.jpg,1285288492084.d451f05024b42f71a115951c62cdcccf.
>>>>>>>>> java.io.EOFException
>>>>>>>>>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>>>>>>>>>        at 
>>>>>>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
>>>>>>>>>        at 
>>>>>>>>> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
>>>>>>>>>        at 
>>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1937)
>>>>>>>>>        at 
>>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1837)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I changed the value to '1', and restarted the regionserver... Note
>>>>>>>>> that my hdfs is not having a problem.
>>>>>>>>>
>>>>>>>>> -Jack
>>>>>>>>>
>>>>>>>>> On Fri, Sep 24, 2010 at 4:01 PM, Stack <[email protected]> wrote:
>>>>>>>>>> Try
>>>>>>>>>>
>>>>>>>>>>  <property>
>>>>>>>>>>    <name>hbase.regions.percheckin</name>
>>>>>>>>>>    <value>10</value>
>>>>>>>>>>    <description>Maximum number of regions that can be assigned in a 
>>>>>>>>>> single go
>>>>>>>>>>    to a region server.
>>>>>>>>>>    </description>
>>>>>>>>>>  </property>
>>>>>>>>>>
>>>>>>>>>> What do you have now?  Whatever it is, go down from there.
>>>>>>>>>>
>>>>>>>>>> St.Ack
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 24, 2010 at 3:07 PM, Jack Levin <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>>> My regions are 1gb in size and when I cold start the cluster I 
>>>>>>>>>>> oversaturate my network links (1000 mbps) and get client dfs 
>>>>>>>>>>> timeouts , anyway to slow the m down?
>>>>>>>>>>>
>>>>>>>>>>> -Jack
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Regions loading too fast

Reply via email to