[
https://issues.apache.org/jira/browse/ATLAS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329424#comment-15329424
]
Hemanth Yamijala commented on ATLAS-801:
----------------------------------------
Update on this issue:
* ATLAS-515 is now committed and provides the ability to setup Kafka topics
with multiple replicas - specified in configuration.
* Adding retries to Kafka is not necessary. The code in
{{AtlasHook.notifyEntities}} already has an ability to retry messages on
failure.
* To alert admins of errors, there is no easy way as the hooks reside in the
context of host components like Hive. The best we can do for now is to ensure
these are logged properly at an appropriate level.
If we extend the last option a bit, the log file can act as a *store only*
solution for now and it can at least help for server side components like hive
hook, falcon hook etc. I will create a new JIRA targeted for 0.7 release to
make sure this logging is in place. This JIRA can be kept open for exploring
other options for a more involved fix to come up later.
> Atlas hooks would lose messages if Kafka is down for extended period of time
> ----------------------------------------------------------------------------
>
> Key: ATLAS-801
> URL: https://issues.apache.org/jira/browse/ATLAS-801
> Project: Atlas
> Issue Type: Improvement
> Reporter: Hemanth Yamijala
> Assignee: Hemanth Yamijala
>
> All integration hooks in Atlas write messages to Kafka which are picked up by
> the Atlas server. If communication to Kafka breaks, then this results in loss
> of metadata messages. This can be mitigated to some extent using multiple
> replicas for Kafka topics (see ATLAS-515). This JIRA is to see if we can make
> this even more robust and have some form of store and forward mechanism for
> increased fault tolerance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)