[
https://issues.apache.org/jira/browse/ATLAS-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184726#comment-15184726
]
venkata madugundu commented on ATLAS-511:
-----------------------------------------
Hemanth Yamijala, thanks for uploading your thought process for HA. Had few
comments...
1. Predominantly the type-definitions for entities are seen as not changing,
like application database schemas. Should Atlas consider a feature toggle
(customized by users) not to refresh/reload types when a passive instance
becomes active ?
I know when the consumer application upgrades its functionality, there will be
type level changes, in that case may be the application can send a special
purpose Atlas request to refresh type cache coordinated by Zookeeper (on all
Atlas instances)
2. How crticial is type cache in the context of purely SCRUD API ? In the
sense, what will be the performance hit if the types are not cached, but
requested from backend store each time they are needed. I think metadata
repositories tend to be more read intensive in terms of usage. In that light
performance of 'Search' is very important. In the light of Atlas DSL query
language, the query validation (and even translation) would need to consult
with types (and their super type hierarchy). Does it make sense to evaluate
query performance with and without type cache. How quickly can Atlas query a
given set of types (the ones query needs) from backend store. If that is quick
enough (quick for few class of applications), then may be Atlas should provide
a way to turn of type cache.
As the HA (and mutiple active instances) is important for our Atlas adoption,
can I be of any help in addressing specific child tasks. Please let me know, as
you are in a best possible situation to decide which ones can be delegated to
other contributors. I have been using Atlas API (the REST API) for quite
sometime now like around 2/3 months. We consume the Atlas REST API using
standard Http client rather than using AtlasClient API. I have fair enough
understanding of DSL (having written a query rewriter for our multi-tenancy
evaluation).
> Ability to run multiple instances of Atlas Server with automatic failover to
> one active server
> ----------------------------------------------------------------------------------------------
>
> Key: ATLAS-511
> URL: https://issues.apache.org/jira/browse/ATLAS-511
> Project: Atlas
> Issue Type: Sub-task
> Reporter: Hemanth Yamijala
> Assignee: Hemanth Yamijala
> Attachments: HADesign.pdf
>
>
> One of the most important components that only supports active-standby mode
> currently is the Atlas server which hosts the API / UI for Atlas. As
> described in the [HA
> Documentation|http://atlas.incubator.apache.org/0.6.0-incubating/HighAvailability.html],
> we currently are limited to running only one instance of the Atlas server
> behind a proxy service. If the running instance goes down, a manual process
> is required to bring up another instance.
> In this JIRA, we propose to have an ability to run multiple Atlas server
> instances. However, as a first step, only one of them will be actively
> processing requests. To have a consistent terminology, let us call that
> server the *master*. Any requests sent to the other servers will be
> redirected to the master.
> When the master suffers a partition, one of the other servers must
> automatically become the master and start processing requests. What this mode
> brings us over the current system is the ability to automatically failover
> the Atlas server instance without any manual intervention. Note that this
> can be arguably called an [active/active
> setup|https://en.wikipedia.org/wiki/High-availability_cluster]
> ATLAS-488 raised to support multiple active Atlas server instances. While
> that would be ideal, we have to learn more about the underlying system
> behavior before we can get there, and hopefully we can take smaller steps to
> improve the system systematically. The method proposed here is similar to
> what is adopted in many other Hadoop components including HDFS NameNode,
> HBase HMaster etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)