[ 
https://issues.apache.org/jira/browse/HADOOP-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546440
 ] 

Jim Kellerman commented on HADOOP-2295:
---------------------------------------

When the master assigned a region twice to the same server, what happened was 
that on the second assignment, the region server returned two 'region is 
opened' messages in the same message to the master (code bug in HRegionServer). 
The first message was processed properly and the master queued a task to update 
the meta. The master then looped to process the second message, and now the 
region was no longer in the 'unassigned regions' list so the master told the 
region server to close it. In the mean time, the update of the meta completes, 
clients start trying to access the region but the region server has closed the 
region so the client gets a NotServingRegion exception.

The master was doing the 'right thing' by reassigning the region because the 
region server took longer than 'hbase.hbasemaster.maxregionopen' (30 seconds) 
to open the region.

Fixing the aforementioned bug will prevent this from occurring, but if the 
master should assign the same region to two different region servers, this 
could easily result in corrupting the region beyond repair as there is a lot of 
file i/o associated with bringing a region on-line. To avoid this, we propose 
that a region server acknowledge that it is opening the region, and when the 
master receives the ack, it extends the region reassignment timeout. If the 
region server subsequently fails to open the region, the master can reassign 
the region after the additional grace period and we can be pretty certain that 
there will be no interference between multiple region servers opening the same 
region.

> [hbase] Still multi-assigning regions
> -------------------------------------
>
>                 Key: HADOOP-2295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2295
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: Jim Kellerman
>
> Start key is img216/1725/22236401lvh6.gif.  We doubly assign.  In between 
> assignments, master rules 'should not have opened region' and sends a 
> shutdown region to remote server -- which it obeys. Meantime, master thinks 
> regions are deployed.
> Here is master log:
> {code}
> ...
> 2007-11-27 04:26:21,818 INFO  hbase.HMaster - region 
> postlog,img216/1725/22236401lvh6.gif,1196164441230 split. New regions are: 
> postlog,img216/1725/22236401lvh6.gif,1196166373235, 
> postlog,img216/3961/aasaxonlivene6.jpg,1196166373241
> 2007-11-27 04:26:23,261 INFO  hbase.HMaster - assigning region 
> postlog,img216/1725/22236401lvh6.gif,1196166373235 to server XX.XX.XX.33:60020
> 2007-11-27 04:26:27,462 DEBUG hbase.HMaster - HMaster.metaScanner scanner: 
> -400573094926830852 regioninfo: {regionname: 
> postlog,img216/1725/22236401lvh6.gif,1196164441230, startKey: 
> <img216/1725/22236401lvh6.gif>, offline: true, split: true, tableDesc: {name: 
> postlog, families: {cookie:={name: cookie, max versions: 1, compression: 
> NONE, in memory: false, max length: 2147483647, bloom filter: none}, 
> ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max 
> length: 2147483647, bloom filter: none}}}}, server: XX.XX.XX.25:60020, 
> startCode: 1196155489036
> 2007-11-27 04:26:27,464 DEBUG hbase.HMaster - HMaster.metaScanner scanner: 
> -400573094926830852 regioninfo: {regionname: 
> postlog,img216/1725/22236401lvh6.gif,1196166373235, startKey: 
> <img216/1725/22236401lvh6.gif>, tableDesc: {name: postlog, families: 
> {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, 
> max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, 
> compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
> none}}}}, server: , startCode: -1
> 2007-11-27 04:26:27,464 DEBUG hbase.HMaster - Checking 
> postlog,img216/1725/22236401lvh6.gif,1196166373235 is assigned
> 2007-11-27 04:26:27,671 DEBUG hbase.HMaster - Checked 
> postlog,img216/1725/22236401lvh6.gif,1196164441230 for references: splitA: 
> true, splitB: true
> 2007-11-27 04:26:53,328 INFO  hbase.HMaster - assigning region 
> postlog,img216/1725/22236401lvh6.gif,1196166373235 to server XX.XX.XX.33:60020
> 2007-11-27 04:27:05,364 INFO  hbase.HMaster - XX.XX.XX.33:60020 serving 
> postlog,img216/1725/22236401lvh6.gif,1196166373235
> 2007-11-27 04:27:05,364 DEBUG hbase.HMaster - region server XX.XX.XX.33:60020 
> should not have opened region 
> postlog,img216/1725/22236401lvh6.gif,1196166373235
> 2007-11-27 04:27:05,367 INFO  hbase.HMaster - regionname: 
> postlog,img216/1725/22236401lvh6.gif,1196166373235, startKey: 
> <img216/1725/22236401lvh6.gif>, tableDesc: {name: postlog, families: 
> {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, 
> max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, 
> compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
> none}}} open on XX.XX.XX.33:60020
> 2007-11-27 04:27:05,368 INFO  hbase.HMaster - updating row 
> postlog,img216/1725/22236401lvh6.gif,1196166373235 in table .META.,,1 with 
> startcode 1196155493271 and server XX.XX.XX.33:60020
> 2007-11-27 04:27:26,380 DEBUG hbase.HMaster - HMaster.metaScanner scanner: 
> -4876467680083821318 regioninfo: {regionname: 
> postlog,img216/1725/22236401lvh6.gif,1196164441230, startKey: 
> <img216/1725/22236401lvh6.gif>, offline: true, split: true, tableDesc: {name: 
> postlog, families: {cookie:={name: cookie, max versions: 1, compression: 
> NONE, in memory: false, max length: 2147483647, bloom filter: none}, 
> ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max 
> length: 2147483647, bloom filter: none}}}}, server: XX.XX.XX.25:60020, 
> startCode: 1196155489036
> 2007-11-27 04:27:26,381 DEBUG hbase.HMaster - HMaster.metaScanner scanner: 
> -4876467680083821318 regioninfo: {regionname: 
> postlog,img216/1725/22236401lvh6.gif,1196166373235, startKey: 
> <img216/1725/22236401lvh6.gif>, tableDesc: {name: postlog, families: 
> {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, 
> max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, 
> compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
> none}}}}, server: XX.XX.XX.33:60020, startCode: 1196155493271
> 2007-11-27 04:27:26,381 DEBUG hbase.HMaster - Checking 
> postlog,img216/1725/22236401lvh6.gif,1196166373235 is assigned
> 2007-11-27 04:27:26,511 DEBUG hbase.HMaster - Checked 
> postlog,img216/1725/22236401lvh6.gif,1196164441230 for references: splitA: 
> true, splitB: true
> 2007-11-27 04:28:26,444 DEBUG hbase.HMaster - HMaster.metaScanner scanner: 
> -7699450087241029374 regioninfo: {regionname: 
> postlog,img216/1725/22236401lvh6.gif,1196164441230, startKey: 
> <img216/1725/22236401lvh6.gif>, offline: true, split: true, tableDesc: {name: 
> postlog, families: {cookie:={name: cookie, max versions: 1, compression: 
> NONE, in memory: false, max length: 2147483647, bloom filter: none}, 
> ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max 
> length: 2147483647, bloom filter: none}}}}, server: XX.XX.XX.25:60020, 
> startCode: 1196155489036
> 2007-11-27 04:28:26,446 DEBUG hbase.HMaster - HMaster.metaScanner scanner: 
> -7699450087241029374 regioninfo: {regionname: 
> postlog,img216/1725/22236401lvh6.gif,1196166373235, startKey: 
> <img216/1725/22236401lvh6.gif>, tableDesc: {name: postlog, families: 
> {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, 
> max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, 
> compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
> none}}}}, server: XX.XX.XX.33:60020, startCode: 1196155493271
> 2007-11-27 04:28:26,446 DEBUG hbase.HMaster - Checking 
> postlog,img216/1725/22236401lvh6.gif,1196166373235 is assigned
> 2007-11-27 04:28:26,590 DEBUG hbase.HMaster - Checked 
> postlog,img216/1725/22236401lvh6.gif,1196164441230 for references: splitA: 
> true, splitB: true
> 2007-11-27 04:29:26,459 DEBUG hbase.HMaster - HMaster.metaScanner scanner: 
> 5846860608957952540 regioninfo: {regionname: 
> postlog,img216/1725/22236401lvh6.gif,1196164441230, startKey: 
> <img216/1401lvh6.gif>, offline: true, split: true, tableDesc: {name: po: 1, 
> compression: NONE, in memory: false, max length: 21474836s: 1, compression: 
> NONE, in memory: false, max length: 21474836er: none}}}}, server: 
> XX.XX.XX.25:60020, startCode: 11961554890ase.HMaster - HMaster.metaScanner 
> scanner: 5846860608957952540 /1725/22236401lvh6.gif,1196166373235, startKey: 
> <img216/1725/22okie:={name: cookie, max versions: 1, compression: NONE, in 
> mem NONE, in memory: false, max length: 2147483647, bloom filter: artCode: 
> 1196155493271
> 2007-11-27 04:29:26,461 DEBUG hbase.HMas235 is assigned
> 2007-11-27 04:29:26,622 DEBUG hbase.HMaster - Cces: splitA: true, splitB: true
> ...
> {code}
> Here is regionserver log on node XX.XX.XX.33:
> {code}
> ...
> 2007-11-27 04:26:29,318 INFO  hbase.HRegionServer - MSG_REGION_OPEN : 
> regionname: postlog,img216/1725/22236401lvh6.gif,1196166373235, startKey: 
> <img216/1725/22236401lvh6.gif>, tableDesc: {name: postlog, families: 
> {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, 
> max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, 
> compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
> none}}}
> 2007-11-27 04:26:29,325 DEBUG hbase.HStore - starting 51396220/cookie (no 
> reconstruction log)
> 2007-11-27 04:26:30,420 DEBUG hbase.HRegionServer - flushing region 
> postlog,img100/941/brandyfn6.gif,1196158363481
> 2007-11-27 04:26:30,420 DEBUG hbase.HRegion - Not flushing cache for region 
> postlog,img100/941/brandyfn6.gif,1196158363481: snapshotMemcaches() 
> determined that there was nothing to do
> 2007-11-27 04:26:31,752 DEBUG hbase.HRegionServer - flushing region 
> postlog,img149/2019/liyanaum6.gif,1196164996321
> 2007-11-27 04:26:31,753 DEBUG hbase.HRegion - Not flushing cache for region 
> postlog,img149/2019/liyanaum6.gif,1196164996321: snapshotMemcaches() 
> determined that there was nothing to do
> 2007-11-27 04:26:31,753 DEBUG hbase.HRegionServer - flushing region 
> postlog,img148/6048/p9290339kb0.jpg,1196162613173
> 2007-11-27 04:26:31,753 DEBUG hbase.HRegion - Not flushing cache for region 
> postlog,img148/6048/p9290339kb0.jpg,1196162613173: snapshotMemcaches() 
> determined that there was nothing to do
> 2007-11-27 04:26:31,754 DEBUG hbase.HRegionServer - flushing region 
> postlog,img132/5512/timeline8bx.png,1196159744555
> 2007-11-27 04:26:31,754 DEBUG hbase.HRegion - Not flushing cache for region 
> postlog,img132/5512/timeline8bx.png,1196159744555: snapshotMemcaches() 
> determined that there was nothing to do
> 2007-11-27 04:26:37,915 DEBUG hbase.HRegionServer - flushing region 
> postlog,img149/3724/whosherne4.jpg,1196164996322
> 2007-11-27 04:26:37,915 DEBUG hbase.HRegion - Not flushing cache for region 
> postlog,img149/3724/whosherne4.jpg,1196164996322: snapshotMemcaches() 
> determined that there was nothing to do
> 2007-11-27 04:26:57,059 DEBUG hbase.HRegionServer - flushing region 
> postlog,img148/8339/361de8578d356flip0hd.png,1196165445439
> 2007-11-27 04:26:57,060 DEBUG hbase.HRegion - Not flushing cache for region 
> postlog,img148/8339/361de8578d356flip0hd.png,1196165445439: 
> snapshotMemcaches() determined that there was nothing to do
> 2007-11-27 04:27:03,604 DEBUG hbase.HStore - maximum sequence id for hstore 
> 51396220/cookie is 30625781
> 2007-11-27 04:27:03,761 DEBUG hbase.HStore - starting 51396220/ip (no 
> reconstruction log)
> 2007-11-27 04:27:03,859 DEBUG hbase.HStore - maximum sequence id for hstore 
> 51396220/ip is 30625781
> 2007-11-27 04:27:03,965 DEBUG hbase.HRegion - Next sequence id for region 
> postlog,img216/1725/22236401lvh6.gif,1196166373235 is 30625782
> 2007-11-27 04:27:03,969 INFO  hbase.HRegion - region 
> postlog,img216/1725/22236401lvh6.gif,1196166373235 available
> 2007-11-27 04:27:03,969 INFO  hbase.HRegionServer - MSG_REGION_OPEN : 
> regionname: postlog,img216/1725/22236401lvh6.gif,1196166373235, startKey: 
> <img216/1725/22236401lvh6.gif>, tableDesc: {name: postlog, families: 
> {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, 
> max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, 
> compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
> none}}}
> 2007-11-27 04:27:04,578 INFO  hbase.HRegionServer - 
> MSG_REGION_CLOSE_WITHOUT_REPORT : regionname: 
> postlog,img216/1725/22236401lvh6.gif,1196166373235, startKey: 
> <img216/1725/22236401lvh6.gif>, tableDesc: {name: postlog, families: 
> {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, 
> max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, 
> compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
> none}}}
> 2007-11-27 04:27:04,578 DEBUG hbase.HRegion - Not flushing cache for region 
> postlog,img216/1725/22236401lvh6.gif,1196166373235: snapshotMemcaches() 
> determined that there was nothing to do
> 2007-11-27 04:27:04,579 DEBUG hbase.HStore - closed 51396220/ip
> 2007-11-27 04:27:04,580 DEBUG hbase.HStore - closed 51396220/cookie
> 2007-11-27 04:27:04,580 INFO  hbase.HRegion - closed 
> postlog,img216/1725/22236401lvh6.gif,1196166373235
> ..
> {code}
> It looks like the assigned region hasn't been cleared from the unassigned 
> regions list when the report comes in.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to