[ 
https://issues.apache.org/jira/browse/ATLAS-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184726#comment-15184726
 ] 

venkata madugundu commented on ATLAS-511:
-----------------------------------------

Hemanth Yamijala, thanks for uploading your thought process for HA. Had few 
comments...
1. Predominantly the type-definitions for entities are seen as not changing, 
like application database schemas. Should Atlas consider a feature toggle 
(customized by users) not to refresh/reload types when a passive instance 
becomes active ?

I know when the consumer application upgrades its functionality, there will be 
type level changes, in that case may be the application can send a special 
purpose Atlas request to refresh type cache coordinated by Zookeeper (on all 
Atlas instances)

2. How crticial is type cache in the context of purely SCRUD API ? In the 
sense, what will be the performance hit if the types are not cached, but 
requested from backend store each time they are needed. I think metadata 
repositories tend to be more read intensive in terms of usage. In that light 
performance of 'Search' is very important. In the light of Atlas DSL query 
language, the query validation (and even translation) would need to consult 
with types (and their super type hierarchy). Does it make sense to evaluate 
query performance with and without type cache. How quickly can Atlas query a 
given set of types (the ones query needs) from backend store. If that is quick 
enough (quick for few class of applications), then may be Atlas should provide 
a way to turn of type cache.

As the HA (and mutiple active instances) is important for our Atlas adoption, 
can I be of any help in addressing specific child tasks. Please let me know, as 
you are in a best possible situation to decide which ones can be delegated to 
other contributors. I have been using Atlas API (the REST API) for quite 
sometime now like around 2/3 months. We consume the Atlas REST API using 
standard Http client rather than using AtlasClient API. I have fair enough 
understanding of DSL (having written a query rewriter for our multi-tenancy 
evaluation).

> Ability to run multiple instances of Atlas Server with automatic failover to 
> one active server
> ----------------------------------------------------------------------------------------------
>
>                 Key: ATLAS-511
>                 URL: https://issues.apache.org/jira/browse/ATLAS-511
>             Project: Atlas
>          Issue Type: Sub-task
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>         Attachments: HADesign.pdf
>
>
> One of the most important components that only supports active-standby mode 
> currently is the Atlas server which hosts the API / UI for Atlas. As 
> described in the [HA 
> Documentation|http://atlas.incubator.apache.org/0.6.0-incubating/HighAvailability.html],
>  we currently are limited to running only one instance of the Atlas server 
> behind a proxy service. If the running instance goes down, a manual process 
> is required to bring up another instance.
> In this JIRA, we propose to have an ability to run multiple Atlas server 
> instances. However, as a first step, only one of them will be actively 
> processing requests. To have a consistent terminology, let us call that 
> server the *master*. Any requests sent to the other servers will be 
> redirected to the master.
> When the master suffers a partition, one of the other servers must 
> automatically become the master and start processing requests. What this mode 
> brings us over the current system is the ability to automatically failover 
> the Atlas server instance without any  manual intervention. Note that this 
> can be arguably called an [active/active 
> setup|https://en.wikipedia.org/wiki/High-availability_cluster]
> ATLAS-488 raised to support multiple active Atlas server instances. While 
> that would be ideal, we have to learn more about the underlying system 
> behavior before we can get there, and hopefully we can take smaller steps to 
> improve the system systematically. The method proposed here is similar to 
> what is adopted in many other Hadoop components including HDFS NameNode, 
> HBase HMaster etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to