[
https://issues.apache.org/jira/browse/ATLAS-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hemanth Yamijala updated ATLAS-511:
-----------------------------------
Attachment: ATLAS-511.patch
A WIP patch that implements most of the design document - attaching this to get
any initial thoughts / review. I am still to do more testing and some
modifications as below:
* All things Kafka need to be looked at carefully, including potentially
turning off auto commit.
* I noticed a bug in ACTIVE->PASSIVE->ACTIVE transitions. These
multi-transitions need to be checked.
* Not sure how post redirects in the web server work.
* More documentation
* Lots more testing.
What the patch includes:
* Curator based leader election. ACTIVE -> PASSIVE and PASSIVE -> ACTIVE
scenarios are working with Solr and HBase backends.
* Object and service initialization split into initialization and activation
stages. Activation is done for services on server becoming active.
* A new web servlet filter is installed to redirect requests from Passive to
active. This required some state stored in Zookeeper. Tested for GETs.
* Feature toggle for backwards compatibility. (no HA)
* Unit tests for all new / modified code.
Will continue working on TODOs. In the meantime, if there's any feedback please
do let me know. I'll put up a review board request once I've finished a little
more testing / tuning.
> Ability to run multiple instances of Atlas Server with automatic failover to
> one active server
> ----------------------------------------------------------------------------------------------
>
> Key: ATLAS-511
> URL: https://issues.apache.org/jira/browse/ATLAS-511
> Project: Atlas
> Issue Type: Sub-task
> Reporter: Hemanth Yamijala
> Assignee: Hemanth Yamijala
> Attachments: ATLAS-511.patch, HADesign.pdf
>
>
> One of the most important components that only supports active-standby mode
> currently is the Atlas server which hosts the API / UI for Atlas. As
> described in the [HA
> Documentation|http://atlas.incubator.apache.org/0.6.0-incubating/HighAvailability.html],
> we currently are limited to running only one instance of the Atlas server
> behind a proxy service. If the running instance goes down, a manual process
> is required to bring up another instance.
> In this JIRA, we propose to have an ability to run multiple Atlas server
> instances. However, as a first step, only one of them will be actively
> processing requests. To have a consistent terminology, let us call that
> server the *master*. Any requests sent to the other servers will be
> redirected to the master.
> When the master suffers a partition, one of the other servers must
> automatically become the master and start processing requests. What this mode
> brings us over the current system is the ability to automatically failover
> the Atlas server instance without any manual intervention. Note that this
> can be arguably called an [active/active
> setup|https://en.wikipedia.org/wiki/High-availability_cluster]
> ATLAS-488 raised to support multiple active Atlas server instances. While
> that would be ideal, we have to learn more about the underlying system
> behavior before we can get there, and hopefully we can take smaller steps to
> improve the system systematically. The method proposed here is similar to
> what is adopted in many other Hadoop components including HDFS NameNode,
> HBase HMaster etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)