[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

stack (JIRA) Wed, 05 Dec 2012 09:42:01 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510622#comment-13510622
 ]


stack commented on HBASE-7247:
------------------------------

Your approach sounds good nkeywal.  No harm writing less but adding the read in 
case we indeed have lost ownership.  It might mean it takes a bit longer to 
realize we've lost ownership but that should be fine.

On the 'post_region_open', IIRC, a bunch of these tickleOpenings were added 
because we saw issues... in this case, an update of .META. that went in though 
we'd lost ownership of the region.

Stepping back (after looking at code), could we drop the notion that a master 
can intercede and assign a region elsewhere because it is proceeding too slow 
on a particular region in the name of simplifying the region open handling 
interaction?  There would be less noise in the logs and less states to deal 
with.

If we did this, I'd think that we'd want the regionserver to do the initial 
move of the znode from OPENING to OPENING to establish ownership (elsewhere I 
have petitioned that the regionserver should set the OPENING state, and not the 
master -- master should set the state to PENDING_OPEN in the znode rather than 
just in master memory -- as a means of cleanly denoting the regionservers' 
assumption of region ownership).  Then the next transition would be from 
OPENING to OPEN or to FAILED_OPEN.  And that would be it.  Master would just 
presume that the only reason to intercede is when the regionserver loses its 
lease in zk.  We'd drop tickling OPENING so master knows we are progressing on 
a region open -- it would just presume regionserver is making progress and that 
it will kill itself if it can't get to HDFS, etc.  We'd also be dropping 
regionserver-side checks that it still owns a region just before it goes to 
update meta (Could change the meta operation to be a check and put so we didn't 
have to go to zk just before meta edit -- the bit of code you quote above 
nkeywal?).

Just putting it out there.

On patch:


Do we fail the open if the following break happens?

{code}
         tickleOpening = tickleOpening("post_open_deploy");
+        if (!tickleOpening) {
+          break;
+        }
{code}

We have to call the +    zkw.sync(node); ?  We always did that?  We are doing 
the sync just to read the old znode value?  Do we have to?  Could we operate w/ 
stale read?

Are you removing this:

-      HRegion region = this.rsServices.getFromOnlineRegions(encodedName);

If so, why or have you moved the check later?

                
> Assignment performances decreased by 50% because of 
> regionserver.OpenRegionHandler#tickleOpening
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7247
>                 URL: https://issues.apache.org/jira/browse/HBASE-7247
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 7247.v1.patch
>
>
> The regionserver.OpenRegionHandler#tickleOpening updates the region znode as 
> "Do this so master doesn't timeout this region-in-transition.".
> However, on the usual test, this makes the assignment time of 1500 regions 
> goes from 70s to 100s, that is, we're 50% slower because of this.
> More generally, ZooKeper commits to disk all the data update, and this takes 
> time. Using it to provide a keep alive seems overkill. At the very list, it 
> could be made asynchronous.
> I'm not sure how necessary these updates are required (I need to go deeper in 
> the internal, feedback welcome), but it seems very important to optimize 
> this... The trival fix would be to make this optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7247) Assignment performances decreased by 50% because of regionserver.OpenRegionHandler#tickleOpening

Reply via email to