[ 
https://issues.apache.org/jira/browse/ATLAS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311964#comment-15311964
 ] 

Hemanth Yamijala commented on ATLAS-801:
----------------------------------------

To summarize the above comments, this problem seems particularly difficult to 
solve in Atlas case due to the following reasons:

* We are constrained by host environments. We should always be watchful of not 
causing unintended side effects to the hosts. Doing so would automatically make 
Atlas less 'trusted'.
* We are distributed in nature. Hence, we can make no assumptions about 
permanence / availability of locally stored data and loss of it could cause the 
same issues as it would cause now without handling Kafka being down.
* We have ordering dependencies that could cause consistency issues. 
Specifically, we need ability to handle missing data, like described 
[here|http://radar.oreilly.com/2015/08/the-world-beyond-batch-streaming-101.html]

IMO, a good solution to this problem, if still desired, would need to make 
improvements to the product in other ways that makes it easier to handle this 
narrower case.

Probably, we could:

h5. Short term

* Ensure multiple replicas for Kafka (ATLAS-515, or through documentation)
* Add retries when sending messages to Kafka.
* Build some way of alerting admins of errors in communicating to Kafka.

h5. Long term

* Enable out of order correction of data, including ability to correct 
inconsistent data.
* Ensure import can be done on more components we support, and make it more 
functional (for e.g. incremental imports etc.) AND/OR 
* Implement local / remote storage options, along with tools to pick up this 
data and import into Atlas.

Thoughts? Comments?

> Atlas hooks would lose messages if Kafka is down for extended period of time
> ----------------------------------------------------------------------------
>
>                 Key: ATLAS-801
>                 URL: https://issues.apache.org/jira/browse/ATLAS-801
>             Project: Atlas
>          Issue Type: Improvement
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>
> All integration hooks in Atlas write messages to Kafka which are picked up by 
> the Atlas server. If communication to Kafka breaks, then this results in loss 
> of metadata messages. This can be mitigated to some extent using multiple 
> replicas for Kafka topics (see ATLAS-515). This JIRA is to see if we can make 
> this even more robust and have some form of store and forward mechanism for 
> increased fault tolerance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to