[ 
https://issues.apache.org/jira/browse/HBASE-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935590#action_12935590
 ] 

HBase Review Board commented on HBASE-3267:
-------------------------------------------

Message from: [email protected]


bq.  On 2010-11-24 15:45:32, Jonathan Gray wrote:
bq.  > This is great.  I like this much better than hacking up the master 
transition code.
bq.  > 
bq.  > My main concern is around the exact semantics of assign/unassign (and 
close).  I think we need to do good javadoc on the HBA methods to describe how 
you would use these or at least a bit about their behavior.  assign() just does 
an assign, but unassign() actually clears stuff out.  It seems doing a close() 
behind the masters back, then asking the master to assign that region, should 
not work... but it does?

Well, my notion is that user shouldn't be doing these manual messings any more. 
 Fixup stuff is now for hbck to do.

Yes, close of a region is done w/o master's involvement.  Rare would you do it. 
 Yes, an assign will assign a region EVEN IF ALREADY assigned.  Messing in here 
can get you in trouble.  I was able to manufacture some ugly conditions -- a 
stuck region trying to assign same server over and over -- but then unassign 
with a force now clears out RIT and does the right thing.... i.e. we have 
enough tools to hang ourselves on new master but also the tools to undo.


bq.  On 2010-11-24 15:45:32, Jonathan Gray wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 
980
bq.  > <http://review.cloudera.org/r/1250/diff/1/?file=17648#file17648line980>
bq.  >
bq.  >     Is there an open_region?  This assign() goes through the master, so 
what is the opposite of close_region which doesn't go through the master?
bq.  >     
bq.  >     Doesn't close_region now put the master in a bad state, so it won't 
expect an assignment to be done on a region which it thinks is already 
assigned?  There is a force on unassign() but not on assign().
bq.  >     
bq.  >     In the old master, for HBCK, I added a hook in to the master to 
clear the in-memory state for a region.  To deal with dupe assignment, I did 
silent close_regions and then cleared the in-memory state.  Then I triggered a 
new assignment.

No open_region.  Someone can add that later if wanted.  Otherwise, use move to 
place region on specific server.

On close_region, yes, puts master in bad state but you'd only do close_region 
when doing fix up of some whack condition.  I was tempted to just remove these 
commands but since we don't know what states new master could put us in, I'll 
leave them in for now.

I'll add force to assign so same as unassign.


Regards what you did for old master hbck, you could call close_regions then an 
unassign with a force would clear memory and get the region assigned elsewhere.

But hbck should be doing this.  Not a user manually, not unless things are 
really hosed.


bq.  On 2010-11-24 15:45:32, Jonathan Gray wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java, 
line 138
bq.  > <http://review.cloudera.org/r/1250/diff/1/?file=17649#file17649line138>
bq.  >
bq.  >     this is awesome javadoc.  is there somewhere else we can put this 
rather than in just the move() API?  Maybe in the HBA class comment or 
something?  Somewhere we can reference in other javadocs about what a 
regionname is

I moved the interface doc out to HBA as per your suggestion.


bq.  On 2010-11-24 15:45:32, Jonathan Gray wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 709
bq.  > <http://review.cloudera.org/r/1250/diff/1/?file=17651#file17651line709>
bq.  >
bq.  >     So you're supposed to call move instead of open_region?  Or why the 
change in move() though this looks good.

Just added it as something you might want to do.  unassign does same thing 
really.  I could back it out.


bq.  On 2010-11-24 15:45:32, Jonathan Gray wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 994
bq.  > <http://review.cloudera.org/r/1250/diff/1/?file=17651#file17651line994>
bq.  >
bq.  >     Why META and not in-memory state?  Once you hit assign() you rely on 
the in-memory state anyways?

I only have a region server name, not an HRI which is what the inmemory state 
is keyed by.   I could iterate the Map I suppose but then I'm thinking it may 
have been cleared from inmemory state.


bq.  On 2010-11-24 15:45:32, Jonathan Gray wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 996
bq.  > <http://review.cloudera.org/r/1250/diff/1/?file=17651#file17651line996>
bq.  >
bq.  >     on assign we just do the assignment, but below on unassign() we 
first clear existing plans and clear from RIT.  why the difference.

I made it so we only clear state if force is added to the unassign.


bq.  On 2010-11-24 15:45:32, Jonathan Gray wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 
1011
bq.  > <http://review.cloudera.org/r/1250/diff/1/?file=17651#file17651line1011>
bq.  >
bq.  >     is this necessary?  should the unassign method taking force deal 
with anything needed to "force" it?

Its needed for case you called assign multiple times. First assign works.  
Second one will be stuck in loop where you ask the region open but the 
regionserver will say its already open and abort the open.  You are stuck in 
this loop unless RIT gets cleared.


bq.  On 2010-11-24 15:45:32, Jonathan Gray wrote:
bq.  > trunk/src/main/ruby/hbase/admin.rb, line 390
bq.  > <http://review.cloudera.org/r/1250/diff/1/?file=17652#file17652line390>
bq.  >
bq.  >     zk didn't work?  why is this removed?

Because we have ./bin/hbase zkcli now which is better way of doing this zk 
interaction.


bq.  On 2010-11-24 15:45:32, Jonathan Gray wrote:
bq.  > trunk/src/main/ruby/shell/commands/assign.rb, line 26
bq.  > <http://review.cloudera.org/r/1250/diff/1/?file=17654#file17654line26>
bq.  >
bq.  >     whitespace.  and what exactly are the semantics of this?  what if 
region is already assigned?
bq.  >     
bq.  >     we should document somewhere more specifically what the behavior is 
of these methods if we're going to expose them to the client and the shell.  
neither place really describes what this means and i can imagine users will be 
doing lots of foot shooting with tools like this.
bq.  >     
bq.  >     more importantly, though, i'm trying to understand the use cases for 
these.  if it's to unbreak stuff, it's not clear to me how exactly you would 
use it given that the master will reject certain operations in the wrong order.

If already assigned it will reassign regardless.  This is a fix up facility for 
expert use only (I updated the help to say this more explicitly).

Well, using these new commands you can break things and then unbreak them too.


bq.  On 2010-11-24 15:45:32, Jonathan Gray wrote:
bq.  > trunk/src/main/ruby/shell/commands/close_region.rb, line 28
bq.  > <http://review.cloudera.org/r/1250/diff/1/?file=17657#file17657line28>
bq.  >
bq.  >     Why would you use close and not unassign/assign/move?  It's because 
close is done silently?  Should say that if that's the distinction.
bq.  >     
bq.  >     Is this comment saying you can use unassign or move after you issue 
close?  or instead of?

I updated the help.   Added 'caution' and for experts only.

Like I say, I wanted to removed these things altogether but my guess is that 
one day we'll need them -- at least while hbck is lacking and while all failure 
modes of new master are not yet known.


bq.  On 2010-11-24 15:45:32, Jonathan Gray wrote:
bq.  > trunk/src/main/ruby/shell/commands/unassign.rb, line 30
bq.  > <http://review.cloudera.org/r/1250/diff/1/?file=17661#file17661line30>
bq.  >
bq.  >     this doesn't use encoded region name?
bq.  >     
bq.  >     is move then different from the other methods?

Yes, move is different from the others.  It tries to make this clear in its 
documentation.   The Map move uses is keyed by encoded region name.  Assign and 
unassign go get the HRI from .META. by using passed regionname.


- stack


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1250/#review1975
-----------------------------------------------------------





> close_region shell command breaks region
> ----------------------------------------
>
>                 Key: HBASE-3267
>                 URL: https://issues.apache.org/jira/browse/HBASE-3267
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver, shell
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.90.0
>
>
> It used to be that you could use the close_region command from the shell to 
> close a region on one server and have the master reassign it elsewhere. Now 
> if you close a region, you get the following errors in the master log:
> 2010-11-23 00:46:34,090 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSING for region 
> ffaa7999e909dbd6544688cc8ab303bd from server 
> haus01.sf.cloudera.com,12020,1290501789693 but region was in  the state null 
> and not in expected PENDI
> 2010-11-23 00:46:34,530 DEBUG 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: 
> master:60000-0x12c537d84e10062 Received ZooKeeper Event, 
> type=NodeDataChanged, state=SyncConnected, 
> path=/hbase/unassigned/ffaa7999e909dbd6544688cc8ab303bd
> 2010-11-23 00:46:34,531 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
> master:60000-0x12c537d84e10062 Retrieved 128 byte(s) of data from znode 
> /hbase/unassigned/ffaa7999e909dbd6544688cc8ab303bd and set watcher; 
> region=usertable,user1951957302,1290501969
> 2010-11-23 00:46:34,531 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, 
> server=haus01.sf.cloudera.com,12020,1290501789693, 
> region=ffaa7999e909dbd6544688cc8ab303bd
> 2010-11-23 00:46:34,531 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
> ffaa7999e909dbd6544688cc8ab303bd from server 
> haus01.sf.cloudera.com,12020,1290501789693 but region was in  the state null 
> and not in expected PENDIN
> and the region just gets stuck closed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to