[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

Enis Soztutar (JIRA) Wed, 16 Oct 2013 16:18:32 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797409#comment-13797409
 ]


Enis Soztutar commented on HBASE-5487:
--------------------------------------

I think as a mental exercise to validate the new design, we should think about 
the cases for the following issues opened recently so that we can ensure that 
these classes of problems are eliminated: 

- HBASE-9724 Failed region split is not handled correctly by AM
- HBASE-9721 meta assignment did not timeout
- HBASE-9696 Master recovery ignores online merge znode
- HBASE-9777 Two consecutive RS crashes could lead to their SSH stepping on 
each other's toes and cause master abort
- HBASE-9773 Master aborted when hbck asked the master to assign a region that 
was already online
- HBASE-9525 "Move" region right after a region split is dangerous
- HBASE-9514 Prevent region from assigning before log splitting is done
- HBASE-9480 Regions are unexpectedly made offline in certain failure conditions
- HBASE-9387 Region could get lost during assignment

bq. Can you please elaborate? Is it the same as modifying several regions' 
state under multi-row lock?
Bulk loading requirement is there, so that we do multiple operations in 
parallel, sending openRegions rpcs for multiple regions at the same time, and 
not doing one-by-one assignment. That is all. 

bq. That is dangerous. IIRC in my spec I only put master abort if somebody 
changes table state under master; but in general, if region is in unknown state 
it's better to make admin act, than to just silently "disappear" part of data - 
that can lead to wrong results.
Quaranteing the table or region is fine, but master should not be down because 
of this (for example, a region can fail to open and you would want to track how 
many times the region failed to open so that you can decide at some point that 
the region should be quarantened state (or failed open state). I think there 
was some issue the region bouncing from server to server indefinitely. 

For table operations intermixing with region operations, I'll have to read your 
updated doc. 


> Generic framework for Master-coordinated tasks
> ----------------------------------------------
>
>                 Key: HBASE-5487
>                 URL: https://issues.apache.org/jira/browse/HBASE-5487
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver, Zookeeper
>    Affects Versions: 0.94.0
>            Reporter: Mubarak Seyed
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>         Attachments: Region management in Master5.docx, Region management in 
> Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant 
> manner. 
> Master-coordinated tasks such as online-scheme change and delete-range 
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core 
> components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

Reply via email to