[ 
https://issues.apache.org/jira/browse/HBASE-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582750#action_12582750
 ] 

stack commented on HBASE-549:
-----------------------------

In response to a query from Jim, I actually cannot explicitly explain how the 
region went offline.  While this is a patched 0.16.0, it does have the patch 
that stops us reading offline flag from regioninfo that is passed over from the 
remote regionserver.  Below I include more logs from around the failing, logs 
that include the meta scan that show the region going from online to offline:

{code}
...
2008-03-25 19:19:49,849 DEBUG org.apache.hadoop.hbase.HMaster: Received 
MSG_REPORT_PROCESS_OPEN : enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 
from XX.XX.45.221:60020
2008-03-25 19:19:56,883 DEBUG org.apache.hadoop.hbase.HMaster: Received 
MSG_REPORT_CLOSE : enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from 
XX.XX.45.220:60020
2008-03-25 19:19:56,883 INFO org.apache.hadoop.hbase.HMaster: 
XX.XX.45.220:60020 no longer serving regionname: 
enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482, startKey: 
<iLStZ0yTnfVUziYcNVVxWV==>, endKey: <jLB27Q4hKls4tSvp64rJfF==>, encodedName: 
1857033608, tableDesc: {name: enwiki_080103, families: {alternate_title:={name: 
alternate_title, max versions: 3, compression: NONE, in memory: false, max 
length: 2147483647, bloom filter: none}, alternate_url:={name: alternate_url, 
max versions: 3, compression: NONE, in memory: false, max length: 2147483647, 
bloom filter: none}, anchor:={name: anchor, max versions: 3, compression: NONE, 
in memory: false, max length: 2147483647, bloom filter: none}, misc:={name: 
misc, max versions: 3, compression: NONE, in memory: false, max length: 
2147483647, bloom filter: none}, page:={name: page, max versions: 3, 
compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
none}, redirect:={name: redirect, max versions: 3, compression: NONE, in 
memory: false, max length: 2147483647, bloom filter: none}}}
2008-03-25 19:19:56,885 DEBUG org.apache.hadoop.hbase.HMaster: Main processing 
loop: ProcessRegionClose of 
enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482, true, false
2008-03-25 19:19:56,885 INFO org.apache.hadoop.hbase.HMaster: region closed: 
enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 
2008-03-25 19:19:56,887 INFO org.apache.hadoop.hbase.HMaster: reassign region: 
enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
2008-03-25 19:19:57,288 INFO org.apache.hadoop.hbase.HMaster: assigning region 
enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 to server 
XX.XX.45.189:60020
2008-03-25 19:20:00,296 DEBUG org.apache.hadoop.hbase.HMaster: Received 
MSG_REPORT_PROCESS_OPEN : enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 
from XX.XX.45.189:60020
2008-03-25 19:20:14,575 DEBUG org.apache.hadoop.hbase.HMaster: 
HMaster.metaScanner regioninfo: {regionname: 
enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482, startKey: 
<iLStZ0yTnfVUziYcNVVxWV==>, endKey: <jLB27Q4hKls4tSvp64rJfF==>, encodedName: 
1857033608, offline: true, tableDesc: {name: enwiki_080103, families: 
{alternate_title:={name: alternate_title, max versions: 3, compression: NONE, 
in memory: false, max length: 2147483647, bloom filter: none}, 
alternate_url:={name: alternate_url, max versions: 3, compression: NONE, in 
memory: false, max length: 2147483647, bloom filter: none}, anchor:={name: 
anchor, max versions: 3, compression: NONE, in memory: false, max length: 
2147483647, bloom filter: none}, misc:={name: misc, max versions: 3, 
compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
none}, page:={name: page, max versions: 3, compression: NONE, in memory: false, 
max length: 2147483647, bloom filter: none}, redirect:={name: redirect, max 
versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom 
filter: none}}}}, server: , startCode: -1
2008-03-25 19:20:16,885 DEBUG org.apache.hadoop.hbase.HMaster: Received 
MSG_REPORT_OPEN : enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from 
XX.XX.45.221:60020
2008-03-25 19:20:16,885 DEBUG org.apache.hadoop.hbase.HMaster: region server 
XX.XX.45.221:60020 should not have opened region 
enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
2008-03-25 19:21:16,118 DEBUG org.apache.hadoop.hbase.HMaster: 
HMaster.metaScanner regioninfo: {regionname: 
enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482, startKey: 
<iLStZ0yTnfVUziYcNVVxWV==>, endKey: <jLB27Q4hKls4tSvp64rJfF==>, encodedName: 
1857033608, offline: true, tableDesc: {name: enwiki_080103, families: 
{alternate_title:={name: alternate_title, max versions: 3, compression: NONE, 
in memory: false, max length: 2147483647, bloom filter: none}, 
alternate_url:={name: alternate_url, max versions: 3, compression: NONE, in 
memory: false, max length: 2147483647, bloom filter: none}, anchor:={name: 
anchor, max versions: 3, compression: NONE, in memory: false, max length: 
2147483647, bloom filter: none}, misc:={name: misc, max versions: 3, 
compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
none}, page:={name: page, max versions: 3, compression: NONE, in memory: false, 
max length: 2147483647, bloom filter: none}, redirect:={name: redirect, max 
versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom 
filter: none}}}}, server: , startCode: -1
{code}

The most interesting line is this one: '2008-03-25 19:19:56,885 DEBUG 
org.apache.hadoop.hbase.HMaster: Main processing loop: ProcessRegionClose of 
enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482, true, false'

That first 'true' is saying that we're doing a close but the region is to be 
reassigned... 

> Don't CLOSE region if message is not from server that opened it or is opening 
> it
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-549
>                 URL: https://issues.apache.org/jira/browse/HBASE-549
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.16.0, 0.2.0, 0.1.1, 0.1.0
>            Reporter: stack
>
> We assign a region to a server.  It takes too long to open (HBASE-505).  
> Region gets assigned to another server.  Meantime original host returns a 
> MSG_REPORT_CLOSE (because other regions opening messes it up moving files on 
> disk out from under it).  We queue a shutdown which marks the region as 
> needing reassignment.  Second server reports in that it successfully opened 
> the region.  Master tells it it should not have opened it.  Churn ensues.
> Fix is to ignore the CLOSE if its reported server/startcode does not match 
> that of the server currently trying to open region.  Fix is not easy because 
> currently we don't keep list of server info in unassigned regions.
> Here's master log snippet showing problem:
> {code}
> ...
> 2008-03-25 19:16:43,711 INFO org.apache.hadoop.hbase.HMaster: assigning 
> region enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 to server 
> XX.XX.XX.220:60020
> 2008-03-25 19:16:46,725 DEBUG org.apache.hadoop.hbase.HMaster: Received 
> MSG_REPORT_PROCESS_OPEN : 
> enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from XX.XX.XX.220:60020
> 2008-03-25 19:18:06,411 DEBUG org.apache.hadoop.hbase.HMaster: shutdown 
> scanner looking at enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
> 2008-03-25 19:18:06,811 DEBUG org.apache.hadoop.hbase.HMaster: shutdown 
> scanner looking at enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
> 2008-03-25 19:19:46,841 INFO org.apache.hadoop.hbase.HMaster: assigning 
> region enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 to server 
> XX.XX.XX.221:60020
> 2008-03-25 19:19:49,849 DEBUG org.apache.hadoop.hbase.HMaster: Received 
> MSG_REPORT_PROCESS_OPEN : 
> enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from XX.XX.XX.221:60020
> 2008-03-25 19:19:56,883 DEBUG org.apache.hadoop.hbase.HMaster: Received 
> MSG_REPORT_CLOSE : enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from 
> XX.XX.XX.220:60020
> 2008-03-25 19:19:56,883 INFO org.apache.hadoop.hbase.HMaster: 
> XX.XX.XX.220:60020 no longer serving regionname: 
> enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482, startKey: 
> <iLStZ0yTnfVUziYcNVVxWV==>, endKey: <jLB27Q4hKls4tSvp64rJfF==
> >, encodedName: 1857033608, tableDesc: {name: enwiki_080103, families: 
> >{alternate_title:={name: alternate_title, max versions: 3, compression: 
> >NONE, in memory: false, max length: 2147483647, bloom filter: none}, 
> >alternate_url:={name: al
> ternate_url, max versions: 3, compression: NONE, in memory: false, max 
> length: 2147483647, bloom filter: none}, anchor:={name: anchor, max versions: 
> 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
> none}, mi
> sc:={name: misc, max versions: 3, compression: NONE, in memory: false, max 
> length: 2147483647, bloom filter: none}, page:={name: page, max versions: 3, 
> compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
> none}, re
> direct:={name: redirect, max versions: 3, compression: NONE, in memory: 
> false, max length: 2147483647, bloom filter: none}}}
> 2008-03-25 19:19:56,885 DEBUG org.apache.hadoop.hbase.HMaster: Main 
> processing loop: ProcessRegionClose of 
> enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482, true, false
> 2008-03-25 19:19:56,885 INFO org.apache.hadoop.hbase.HMaster: region closed: 
> enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
> 2008-03-25 19:19:56,887 INFO org.apache.hadoop.hbase.HMaster: reassign 
> region: enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
> 2008-03-25 19:19:57,288 INFO org.apache.hadoop.hbase.HMaster: assigning 
> region enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 to server 
> XX.XX.XX.189:60020
> 2008-03-25 19:20:00,296 DEBUG org.apache.hadoop.hbase.HMaster: Received 
> MSG_REPORT_PROCESS_OPEN : 
> enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from XX.XX.XX.189:60020
> 2008-03-25 19:20:16,885 DEBUG org.apache.hadoop.hbase.HMaster: Received 
> MSG_REPORT_OPEN : enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 from 
> XX.XX.XX.221:60020
> 2008-03-25 19:20:16,885 DEBUG org.apache.hadoop.hbase.HMaster: region server 
> XX.XX.XX.221:60020 should not have opened region 
> enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
> 2008-03-25 19:23:51,707 DEBUG org.apache.hadoop.hbase.HMaster: shutdown 
> scanner looking at enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
> 2008-03-25 19:23:51,834 DEBUG org.apache.hadoop.hbase.HMaster: shutdown 
> scanner looking at enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482
> 2008-03-25 19:23:53,947 INFO org.apache.hadoop.hbase.HMaster: assigning 
> region enwiki_080103,iLStZ0yTnfVUziYcNVVxWV==,1205393076482 to server 
> XX.XX.XX.97:60020
> ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to