[
https://issues.apache.org/jira/browse/ATLAS-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186402#comment-15186402
]
Hemanth Yamijala commented on ATLAS-511:
----------------------------------------
[~vmadugun] / [~cassiodossantos], From my (possibly incomplete) understanding
of the core backend, a few points stand out in consideration of the TypeSystem
cache:
* Currently Atlas relies on it *completely* for all reads. As Venkat mentioned
in his comments, DSL query translation to Gremlin query relies on this
information. Since the volume of reads is expected to be high, I intuitively
feel that the cache is of value. Possibly not in the aggressive manner in which
it is currently relying on, but at least as a significant performance
optimization. Completely turning off the Cache in that sense seems to me a bit
too extreme. If we are modeling this, we could possibly model it as a strategy
of which no caching is one alternative, and read with fall through could be
another. I am convinced by Cassio's point that letting the types grow unbounded
(the current implementation) feels a little too extreme as well.
* You mention that types will be relatively unchanging. I am assuming that you
are saying this based on the usage pattern you have seen (or are envisioning to
see). I had a question on this. Seeing that trait definitions are also types
and are also cached in the TypeSystem and that all lookups of traits happen
from here, how frequently are these CRUD'ed in your case? Of course, this can
be solved by a programmatic refresh (or using a dirty read mechanism) as you
both have suggested.
I am happy that we are aligned on basing any of these decisions on concrete
measurements. We have been working to set up some very basic test suites that
well help us get started with performance measurement. I will open JIRAs to
spell out more details on this.
Venkat, thanks for your offer for help in this task. At this stage, since you
have specific interest in improving the cache behavior , it may be good if you
can spend some energy on this and see what you find. Please feel free to open
JIRAs and propose your approach / solutions.
Needless to say, there are folks more experienced on this area than I am. I am
hoping they will chime in with thoughts (in particular, if we're going down the
wrong track).
> Ability to run multiple instances of Atlas Server with automatic failover to
> one active server
> ----------------------------------------------------------------------------------------------
>
> Key: ATLAS-511
> URL: https://issues.apache.org/jira/browse/ATLAS-511
> Project: Atlas
> Issue Type: Sub-task
> Reporter: Hemanth Yamijala
> Assignee: Hemanth Yamijala
> Attachments: HADesign.pdf
>
>
> One of the most important components that only supports active-standby mode
> currently is the Atlas server which hosts the API / UI for Atlas. As
> described in the [HA
> Documentation|http://atlas.incubator.apache.org/0.6.0-incubating/HighAvailability.html],
> we currently are limited to running only one instance of the Atlas server
> behind a proxy service. If the running instance goes down, a manual process
> is required to bring up another instance.
> In this JIRA, we propose to have an ability to run multiple Atlas server
> instances. However, as a first step, only one of them will be actively
> processing requests. To have a consistent terminology, let us call that
> server the *master*. Any requests sent to the other servers will be
> redirected to the master.
> When the master suffers a partition, one of the other servers must
> automatically become the master and start processing requests. What this mode
> brings us over the current system is the ability to automatically failover
> the Atlas server instance without any manual intervention. Note that this
> can be arguably called an [active/active
> setup|https://en.wikipedia.org/wiki/High-availability_cluster]
> ATLAS-488 raised to support multiple active Atlas server instances. While
> that would be ideal, we have to learn more about the underlying system
> behavior before we can get there, and hopefully we can take smaller steps to
> improve the system systematically. The method proposed here is similar to
> what is adopted in many other Hadoop components including HDFS NameNode,
> HBase HMaster etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)