Re: Review Request: hbase-3362 If .META. offline between OPENING and OPENED, then wrong server location in .META. is possible

Jonathan Gray Thu, 16 Dec 2010 10:02:22 -0800


> On 2010-12-16 00:14:36, Jonathan Gray wrote:
> > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java,
> >  line 96
> > <http://review.cloudera.org/r/1298/diff/1/?file=18309#file18309line96>
> >
> >     We were not previously but we should probably log this condition
> 
> stack wrote:
>     We do in the method?


yeah you're right, thought that was only on the exception.  nvm.


> On 2010-12-16 00:14:36, Jonathan Gray wrote:
> > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java,
> >  line 173
> > <http://review.cloudera.org/r/1298/diff/1/?file=18309#file18309line173>
> >
> >     This is a busy wait loop?
> >     
> >     Should we add a wait/notify on something passed to the thread and w/ a 
> > timeout of the period?
> >     
> >     And then we should probably also have some kind of max timeout.  Even 
> > if minutes, there could be weird cluster state where the RS misses META 
> > availability but someone else might handle it properly, so max timeout 
> > might be good?
> 
> stack wrote:
>     I need to add a small sleep.  I'd rather do this than wait/notify.  
> t.isAlive should be enough.  Regards max timeout, I should add check if 
> server is stopped ... and for max timeout, what you think?  Ten minutes?  
> Then abort?

I was thinking 5 minutes.

How long you going to sleep for?  That seems like an unideal way to do this.  I 
would prefer wait/notify and have timeout on wait be this 1/3 period, but small 
sleep could work.  If really small, we're in busy loop again.  If too big, we 
increase how long we have to wait.  This is on critical path of every single 
region open.

If we go down path of threads doing work, I don't see why we don't want to use 
wait/notify to let the blocked thread know when it's done.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1298/#review2083
-----------------------------------------------------------


On 2010-12-15 16:14:27, stack wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/1298/
> -----------------------------------------------------------
> 
> (Updated 2010-12-15 16:14:27)
> 
> 
> Review request for hbase and Jonathan Gray.
> 
> 
> Summary
> -------
> 
> M 
> src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
>  Removed stale comments and TODOs.
> 
>  Added a 'version' datamenber, the znode edit version which we keep across 
> open process.
> 
>  Refactored the setting of OPENING out into a method that is used in multiple 
> places 
>  now rather than repeat code.  Did this in new tickleOpening method.
> 
>  Added new PostOpenDeployTasksThread which we run to do the 
> postOpenDeployTasks.
>  While its running we update OPENING state if its running a while.
> 
> 
> This addresses bug hbase-3362.
>     http://issues.apache.org/jira/browse/hbase-3362
> 
> 
> Diffs
> -----
> 
>   
> trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
>  1049707 
> 
> Diff: http://review.cloudera.org/r/1298/diff
> 
> 
> Testing
> -------
> 
> Ran it on my cluster. Seems to work as the old code did.
> 
> 
> Thanks,
> 
> stack
> 
>

Re: Review Request: hbase-3362 If .META. offline between OPENING and OPENED, then wrong server location in .META. is possible

Reply via email to