[ 
https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836478#action_12836478
 ] 

Kannan Muthukkaruppan commented on HBASE-2244:
----------------------------------------------

@stack, jdcryans:  The exact implication of the "references" -- i.e. the 
daughter regions still sharing HFile with the parent for a period of 
time -- hadn't sunk in.  But it makes sense now. I think we can close 
this issue as not a bug.

@stack: If you found other opportunities (as part of your recent code walk 
through) for making the general handling of splits more bullet proof,
perhaps we should file that as a separate issue so that it doesn't get
muddled with this non-issue.

@ What was happening during the time the client failed? 

In this particular case the RS was shutting down. The DNs were getting 
overwhelmed in my small cluster test case, and not able to keep up. It started 
with a lot of:

{code}
2010-02-19 08:48:57,462 WARN org.apache.hadoop.hdfs.DFSClient: 
NotReplicatedYetException sleeping 
/hbase-kannan1/test1/580635726/actions/133921297096924993\
7 retries left 1
{code}

Followed by:
{code}
2010-02-19 08:49:07,102 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
for block blk_9144926768183088527_186431 bad datanode[0] nodes == null
2010-02-19 08:49:07,102 WARN org.apache.hadoop.hdfs.DFSClient: Could not get 
block locations. Source file "/hbase-kannan1/test1/580635726/actions/133921297\
0969249937" - Aborting...
2010-02-19 08:49:07,117 FATAL 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog required. 
Forcing server shutdown
{code}





> META gets inconsistent in a number of crash scenarios
> -----------------------------------------------------
>
>                 Key: HBASE-2244
>                 URL: https://issues.apache.org/jira/browse/HBASE-2244
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.4
>
>
> (Forking this issue off from HBASE-2235).
> During load testing, in a number of failure scenarios (unexpected region 
> server deaths) etc., we notice that META can get inconsistent. This primarily 
> happens for regions which are in the process of being split. Manually running 
> add_table.rb seems to fix the tables meta data just fine. 
> But it would be good to do automatic cleansing (as part of META scanners 
> work) and/or avoid these inconsistent states altogether.
> For example, for a particular startkey, I see all these entries:
> {code}
> test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, 
> value=REGION => {NAME => 'test1,
>                              1204765,1266569946560', STARTKEY => '1204765', 
> ENDKEY => '1441091', ENCODED => 18
>                              19368969, OFFLINE => true, SPLIT => true, TABLE 
> => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', 
> COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', 
> BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, 
> value=10.129.68.212:60020
>  test1,1204765,1266569946560 column=info:serverstartcode, 
> timestamp=1266570029133, value=1266562597546
>  test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, 
> value=\x00\x071441091\x00\x00\x00\x0
>                              
> 1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x
>                              
> 05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x
>                              
> 00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00
>                              
> \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI
>                              
> ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT
>                              
> L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00
>                              
> \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t
>                              rueh\x0FQ\xCF
>  test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, 
> value=REGION => {NAME => 'test1,
>                              1204765,1266581233447', STARTKEY => '1204765', 
> ENDKEY => '1290703', ENCODED => 13
>                              73493090, OFFLINE => true, SPLIT => true, TABLE 
> => {{NAME => 'test1', FAMILIES =>
>                               [{NAME => 'actions', VERSIONS => '3', 
> COMPRESSION => 'NONE', TTL => '2147483647'
>                              , BLOCKSIZE => '65536', IN_MEMORY => 'false', 
> BLOCKCACHE => 'true'}]}}
>  test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, 
> value=10.129.68.213:60020
>  test1,1204765,1266581233447 column=info:serverstartcode, 
> timestamp=1266604768670, value=1266562597511
>  test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, 
> value=\x00\x071226169\x00\x00\x00\x0
>                              
> 1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\
>                              
> x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              
> x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              
> 0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              
> x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              
> 00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              
> 0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xB9\xBD\xFEO
>  test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, 
> value=\x00\x071290703\x00\x00\x00\x0
>                              
> 1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\
>                              
> x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\
>                              
> x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0
>                              
> 0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\
>                              
> x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x
>                              
> 00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0
>                              
> 0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true
>                              \xE1\xDF\xF8p
>  test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, 
> value=REGION => {NAME => 'test1,
>                              1204765,1266609171581', STARTKEY => '1204765', 
> ENDKEY => '1226169', ENCODED => 21
>                              34878372, TABLE => {{NAME => 'test1', FAMILIES 
> => [{NAME => 'actions', VERSIONS =
>                              > '3', COMPRESSION => 'NONE', TTL => 
> '2147483647', BLOCKSIZE => '65536', IN_MEMOR
>                              Y => 'false', BLOCKCACHE => 'true'}]}}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to