[ 
https://issues.apache.org/jira/browse/ATLAS-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated ATLAS-511:
-----------------------------------
    Attachment: ATLAS-511.patch

A WIP patch that implements most of the design document - attaching this to get 
any initial thoughts / review. I am still to do more testing and some 
modifications as below:

* All things Kafka need to be looked at carefully, including potentially 
turning off auto commit.
* I noticed a bug in ACTIVE->PASSIVE->ACTIVE transitions. These 
multi-transitions need to be checked.
* Not sure how post redirects in the web server work.
* More documentation
* Lots more testing.

What the patch includes:
* Curator based leader election. ACTIVE -> PASSIVE and PASSIVE -> ACTIVE 
scenarios are working with Solr and HBase backends.
* Object and service initialization split into initialization and activation 
stages. Activation is done for services on server becoming active.
* A new web servlet filter is installed to redirect requests from Passive to 
active. This required some state stored in Zookeeper. Tested for GETs.
* Feature toggle for backwards compatibility. (no HA)
* Unit tests for all new / modified code.

Will continue working on TODOs. In the meantime, if there's any feedback please 
do let me know. I'll put up a review board request once I've finished a little 
more testing / tuning.

> Ability to run multiple instances of Atlas Server with automatic failover to 
> one active server
> ----------------------------------------------------------------------------------------------
>
>                 Key: ATLAS-511
>                 URL: https://issues.apache.org/jira/browse/ATLAS-511
>             Project: Atlas
>          Issue Type: Sub-task
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>         Attachments: ATLAS-511.patch, HADesign.pdf
>
>
> One of the most important components that only supports active-standby mode 
> currently is the Atlas server which hosts the API / UI for Atlas. As 
> described in the [HA 
> Documentation|http://atlas.incubator.apache.org/0.6.0-incubating/HighAvailability.html],
>  we currently are limited to running only one instance of the Atlas server 
> behind a proxy service. If the running instance goes down, a manual process 
> is required to bring up another instance.
> In this JIRA, we propose to have an ability to run multiple Atlas server 
> instances. However, as a first step, only one of them will be actively 
> processing requests. To have a consistent terminology, let us call that 
> server the *master*. Any requests sent to the other servers will be 
> redirected to the master.
> When the master suffers a partition, one of the other servers must 
> automatically become the master and start processing requests. What this mode 
> brings us over the current system is the ability to automatically failover 
> the Atlas server instance without any  manual intervention. Note that this 
> can be arguably called an [active/active 
> setup|https://en.wikipedia.org/wiki/High-availability_cluster]
> ATLAS-488 raised to support multiple active Atlas server instances. While 
> that would be ideal, we have to learn more about the underlying system 
> behavior before we can get there, and hopefully we can take smaller steps to 
> improve the system systematically. The method proposed here is similar to 
> what is adopted in many other Hadoop components including HDFS NameNode, 
> HBase HMaster etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to