[ https://issues.apache.org/jira/browse/HBASE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836322#action_12836322 ]
stack commented on HBASE-2244: ------------------------------ Within the scope of this issue we should a couple of things for the 0.20 branch specifically. First, add fixup to the metascanner for case where offlined parent but no daughters present because HRS crashed and didn't add daughters... or HRS carrying .META. crashed and we only recovered the parent offlining edit, but not the daughter additions. This inconsistency is clean so easy to recognize. Also, data needed to do the repair is in the offlined parent as columns splitA and splitB. Here are a few notes. + On split, the HRS does three updates: 1. offline parent and add splitA and splitB columns that hold the HRegionInfo of daughter split regions, 2. add daughter A, and 3., add daughter B. The updates are not done atomically. Before we send the messages, the HRS has offlined (closed) the parent and created two new daughters. The parent is already unavailable. ++ Reading code, there are issues to address in here. If we crash after parent close, thats ok. The parent will be assigned to a new HRS. But subsequently, as the split goes forward, we do an open of the new daughter regions BEFORE we add them to the .META.. This seems like it could be avoided (speeding the split); only open once assigned in new location (Moving the location of where we do the split work should be all that is needed). Also, if already a daughter region of same name in the FS, we'll fail the split rather than overwrite as it seems we should do (only reason for a pre-existing daughter is a split failed mid-way). I can add a check of .META. If daughter not there, its for sure a failed split. Let me see if I can improve stuff in here in general for 0.20 as part of this patch. I need to study some more. + The HRS, after making updates in the .META., then sends a message to the master telling it about the split. The master adds the new daughters to his assignment list and they are assigned out on next report-in by a cluster-member. If this message is missed, the daughters are assigned the next time the metascanner runs. In the .META. listing posted above, there are some interesting issues. We still have a reference to a daughter, splitB, in the first offlined (row) region, yet the next row is a daughter that has been offlined itself. There may be a race in here if we're splitting fast. Let me check it out and see if a fix. The other inconsistency is that there seems to be a row missing of the end, the splitB from test1,1204765,1266581233447. Is that possible? > META gets inconsistent in a number of crash scenarios > ----------------------------------------------------- > > Key: HBASE-2244 > URL: https://issues.apache.org/jira/browse/HBASE-2244 > Project: Hadoop HBase > Issue Type: Bug > Reporter: Kannan Muthukkaruppan > Assignee: stack > Priority: Critical > Fix For: 0.20.4 > > > (Forking this issue off from HBASE-2235). > During load testing, in a number of failure scenarios (unexpected region > server deaths) etc., we notice that META can get inconsistent. This primarily > happens for regions which are in the process of being split. Manually running > add_table.rb seems to fix the tables meta data just fine. > But it would be good to do automatic cleansing (as part of META scanners > work) and/or avoid these inconsistent states altogether. > For example, for a particular startkey, I see all these entries: > {code} > test1,1204765,1266569946560 column=info:regioninfo, timestamp=1266581302018, > value=REGION => {NAME => 'test1, > 1204765,1266569946560', STARTKEY => '1204765', > ENDKEY => '1441091', ENCODED => 18 > 19368969, OFFLINE => true, SPLIT => true, TABLE > => {{NAME => 'test1', FAMILIES => > [{NAME => 'actions', VERSIONS => '3', > COMPRESSION => 'NONE', TTL => '2147483647' > , BLOCKSIZE => '65536', IN_MEMORY => 'false', > BLOCKCACHE => 'true'}]}} > test1,1204765,1266569946560 column=info:server, timestamp=1266570029133, > value=10.129.68.212:60020 > test1,1204765,1266569946560 column=info:serverstartcode, > timestamp=1266570029133, value=1266562597546 > test1,1204765,1266569946560 column=info:splitB, timestamp=1266581302018, > value=\x00\x071441091\x00\x00\x00\x0 > > 1\x26\xE6\x1F\xDF\x27\x1Btest1,1290703,1266581233447\x00\x071290703\x00\x00\x00\x > > 05\x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x > > 00\x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00 > > \x00\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSI > > ON\x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TT > > L\x00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00 > > \x00\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04t > rueh\x0FQ\xCF > test1,1204765,1266581233447 column=info:regioninfo, timestamp=1266609172177, > value=REGION => {NAME => 'test1, > 1204765,1266581233447', STARTKEY => '1204765', > ENDKEY => '1290703', ENCODED => 13 > 73493090, OFFLINE => true, SPLIT => true, TABLE > => {{NAME => 'test1', FAMILIES => > [{NAME => 'actions', VERSIONS => '3', > COMPRESSION => 'NONE', TTL => '2147483647' > , BLOCKSIZE => '65536', IN_MEMORY => 'false', > BLOCKCACHE => 'true'}]}} > test1,1204765,1266581233447 column=info:server, timestamp=1266604768670, > value=10.129.68.213:60020 > test1,1204765,1266581233447 column=info:serverstartcode, > timestamp=1266604768670, value=1266562597511 > test1,1204765,1266581233447 column=info:splitA, timestamp=1266609172177, > value=\x00\x071226169\x00\x00\x00\x0 > > 1\x26\xE7\xCA,\x7D\x1Btest1,1204765,1266609171581\x00\x071204765\x00\x00\x00\x05\ > > x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\ > > x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0 > > 0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\ > > x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x > > 00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0 > > 0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true > \xB9\xBD\xFEO > test1,1204765,1266581233447 column=info:splitB, timestamp=1266609172177, > value=\x00\x071290703\x00\x00\x00\x0 > > 1\x26\xE7\xCA,\x7D\x1Btest1,1226169,1266609171581\x00\x071226169\x00\x00\x00\x05\ > > x05test1\x00\x00\x00\x00\x00\x02\x00\x00\x00\x07IS_ROOT\x00\x00\x00\x05false\x00\ > > x00\x00\x07IS_META\x00\x00\x00\x05false\x00\x00\x00\x01\x07\x07actions\x00\x00\x0 > > 0\x07\x00\x00\x00\x0BBLOOMFILTER\x00\x00\x00\x05false\x00\x00\x00\x0BCOMPRESSION\ > > x00\x00\x00\x04NONE\x00\x00\x00\x08VERSIONS\x00\x00\x00\x013\x00\x00\x00\x03TTL\x > > 00\x00\x00\x0A2147483647\x00\x00\x00\x09BLOCKSIZE\x00\x00\x00\x0565536\x00\x00\x0 > > 0\x09IN_MEMORY\x00\x00\x00\x05false\x00\x00\x00\x0ABLOCKCACHE\x00\x00\x00\x04true > \xE1\xDF\xF8p > test1,1204765,1266609171581 column=info:regioninfo, timestamp=1266609172212, > value=REGION => {NAME => 'test1, > 1204765,1266609171581', STARTKEY => '1204765', > ENDKEY => '1226169', ENCODED => 21 > 34878372, TABLE => {{NAME => 'test1', FAMILIES > => [{NAME => 'actions', VERSIONS = > > '3', COMPRESSION => 'NONE', TTL => > '2147483647', BLOCKSIZE => '65536', IN_MEMOR > Y => 'false', BLOCKCACHE => 'true'}]}} > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.