[jira] [Commented] (HBASE-3638) If a FS bootstrap, need to also ensure ZK is cleaned

Shrijeet Paliwal (Commented) (JIRA) Tue, 10 Jan 2012 21:33:27 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183875#comment-13183875
 ]


Shrijeet Paliwal commented on HBASE-3638:
-----------------------------------------

We just hit this issue today in production. We did not do an FS bootstrap (I 
assume you mean cleaning /hbase directory from hdfs by FS bootstrap). It was a 
regular day a RS was throwing not serving exceptions and I went ahead and 
restarted it. It was not a META or ROOT serving RS. Following this RS restart 
hbck started reporting holes in regions. 

Later, for some unexplainable, crazy and panicky reason I restarted Master and 
all other region servers. This is the point where master started complaining 
META is in OPENED state in ZK, for a server which no longer exists. And like 
Todd explained in the other Jira, master went to an unending loop. 

The work around was to clear up all files from ZK data directory. 

What do you think Stack, can master pick a *stale* ZK state which is not a 
leftover from previous HBase install, in other words a stale state created by 
itself?
                
> If a FS bootstrap, need to also ensure ZK is cleaned
> ----------------------------------------------------
>
>                 Key: HBASE-3638
>                 URL: https://issues.apache.org/jira/browse/HBASE-3638
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Minor
>
> In a test environment where a cycle of start, operation, kill hbase (repeat), 
> noticed that we were doing a bootstrap on startup but then we were picking up 
> the previous cycles zk state.  It made for a mess in the test.
> Last thing seen on previous cycle was:
> {code}
> 2011-03-11 06:33:36,708 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=X.X.X.60020,1299853933073, 
> region=1028785192/.META.
> {code}
> Then, in the messed up cycle I saw:
> {code}
> 2011-03-11 06:42:48,530 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
> BOOTSTRAP: creating ROOT and first META regions
> .....
> {code}
> Then after setting watcher on .META., we get a 
> {code}
> 2011-03-11 06:42:58,301 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
> .META.,,1.1028785192 in state RS_ZK_REGION_OPENED
> 2011-03-11 06:42:58,302 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Region in transition 
> 1028785192 references a server no longer up X.X.X; letting RIT timeout so 
> will be assigned elsewhere
> {code}
> We're all confused.
> Should at least clear our zk if a bootstrap happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3638) If a FS bootstrap, need to also ensure ZK is cleaned

Reply via email to