[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797409#comment-13797409
]
Enis Soztutar commented on HBASE-5487:
--------------------------------------
I think as a mental exercise to validate the new design, we should think about
the cases for the following issues opened recently so that we can ensure that
these classes of problems are eliminated:
- HBASE-9724 Failed region split is not handled correctly by AM
- HBASE-9721 meta assignment did not timeout
- HBASE-9696 Master recovery ignores online merge znode
- HBASE-9777 Two consecutive RS crashes could lead to their SSH stepping on
each other's toes and cause master abort
- HBASE-9773 Master aborted when hbck asked the master to assign a region that
was already online
- HBASE-9525 "Move" region right after a region split is dangerous
- HBASE-9514 Prevent region from assigning before log splitting is done
- HBASE-9480 Regions are unexpectedly made offline in certain failure conditions
- HBASE-9387 Region could get lost during assignment
bq. Can you please elaborate? Is it the same as modifying several regions'
state under multi-row lock?
Bulk loading requirement is there, so that we do multiple operations in
parallel, sending openRegions rpcs for multiple regions at the same time, and
not doing one-by-one assignment. That is all.
bq. That is dangerous. IIRC in my spec I only put master abort if somebody
changes table state under master; but in general, if region is in unknown state
it's better to make admin act, than to just silently "disappear" part of data -
that can lead to wrong results.
Quaranteing the table or region is fine, but master should not be down because
of this (for example, a region can fail to open and you would want to track how
many times the region failed to open so that you can decide at some point that
the region should be quarantened state (or failed open state). I think there
was some issue the region bouncing from server to server indefinitely.
For table operations intermixing with region operations, I'll have to read your
updated doc.
> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
> Key: HBASE-5487
> URL: https://issues.apache.org/jira/browse/HBASE-5487
> Project: HBase
> Issue Type: New Feature
> Components: master, regionserver, Zookeeper
> Affects Versions: 0.94.0
> Reporter: Mubarak Seyed
> Assignee: Sergey Shelukhin
> Priority: Critical
> Attachments: Region management in Master5.docx, Region management in
> Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant
> manner.
> Master-coordinated tasks such as online-scheme change and delete-range
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core
> components
--
This message was sent by Atlassian JIRA
(v6.1#6144)