[
https://issues.apache.org/jira/browse/HBASE-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Yu updated HBASE-3809:
--------------------------
Fix Version/s: (was: 0.92.0)
0.94.0
> .META. may not come back online if > number of executors servers crash and
> one of those > number of executors was carrying meta
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-3809
> URL: https://issues.apache.org/jira/browse/HBASE-3809
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Priority: Critical
> Fix For: 0.94.0
>
>
> This is a duplicate of another issue but at the moment I cannot find the
> original.
> If you had a 700 node cluster and then you ran something on the cluster which
> killed 100 nodes, and .META. had been running on one of those downed nodes,
> well, you'll have all of your master executors processing ServerShutdowns and
> more than likely non of the currently processing executors will be servicing
> the shutdown of the server that was carrying .META.
> Well, for server shutdown to complete at the moment, an online .META. is
> required. So, in the above case, we'll be stuck. The current executors will
> not be able to clear to make space for the processing of the server carrying
> .META. because they need .META. to complete.
> We can make the master handlers have no bound so it will expand to accomodate
> all crashed servers -- so it'll have the one .META. in its queue -- or we can
> change it so shutdown handling doesn't require .META. to be on-line (its used
> to figure the regions the server was carrying); we could use the master's
> in-memory picture of the cluster (But IIRC, there may be holes ....TBD)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira