[
https://issues.apache.org/jira/browse/ATLAS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311964#comment-15311964
]
Hemanth Yamijala commented on ATLAS-801:
----------------------------------------
To summarize the above comments, this problem seems particularly difficult to
solve in Atlas case due to the following reasons:
* We are constrained by host environments. We should always be watchful of not
causing unintended side effects to the hosts. Doing so would automatically make
Atlas less 'trusted'.
* We are distributed in nature. Hence, we can make no assumptions about
permanence / availability of locally stored data and loss of it could cause the
same issues as it would cause now without handling Kafka being down.
* We have ordering dependencies that could cause consistency issues.
Specifically, we need ability to handle missing data, like described
[here|http://radar.oreilly.com/2015/08/the-world-beyond-batch-streaming-101.html]
IMO, a good solution to this problem, if still desired, would need to make
improvements to the product in other ways that makes it easier to handle this
narrower case.
Probably, we could:
h5. Short term
* Ensure multiple replicas for Kafka (ATLAS-515, or through documentation)
* Add retries when sending messages to Kafka.
* Build some way of alerting admins of errors in communicating to Kafka.
h5. Long term
* Enable out of order correction of data, including ability to correct
inconsistent data.
* Ensure import can be done on more components we support, and make it more
functional (for e.g. incremental imports etc.) AND/OR
* Implement local / remote storage options, along with tools to pick up this
data and import into Atlas.
Thoughts? Comments?
> Atlas hooks would lose messages if Kafka is down for extended period of time
> ----------------------------------------------------------------------------
>
> Key: ATLAS-801
> URL: https://issues.apache.org/jira/browse/ATLAS-801
> Project: Atlas
> Issue Type: Improvement
> Reporter: Hemanth Yamijala
> Assignee: Hemanth Yamijala
>
> All integration hooks in Atlas write messages to Kafka which are picked up by
> the Atlas server. If communication to Kafka breaks, then this results in loss
> of metadata messages. This can be mitigated to some extent using multiple
> replicas for Kafka topics (see ATLAS-515). This JIRA is to see if we can make
> this even more robust and have some form of store and forward mechanism for
> increased fault tolerance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)