[jira] [Commented] (ATLAS-629) Kafka messages in ATLAS_HOOK might be lost in HA mode at the instant of failover.

Hemanth Yamijala (JIRA) Thu, 05 May 2016 05:48:09 -0700

    [ 
https://issues.apache.org/jira/browse/ATLAS-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272262#comment-15272262
 ]


Hemanth Yamijala commented on ATLAS-629:
----------------------------------------

Started looking at the approach to fix this problem. With Kafka's (old) high 
level consumer, we only have *atmost-once* delivery because the offsets read 
from the partitions are auto committed by default. So if a message is read and 
offset auto committed, but before the metadata ingest is completed, the server 
reboots, then this message could be lost for processing.

To fix this issue, I am looking at *atleast-once* delivery semantics with 
Kafka, under the assumption that *message processing can be idempotent on the 
server*. Given we use transactions in Titan and also have create-or-update 
semantics, this may be mostly true - but not really sure. Will need to test.

To move to atleast-once processing, the predominant approach people follow 
seems to be to:
* disable auto commit
* Create one ConsumerConnector per partition of a topic.

The latter is because the old high level consumer does not provide for commit 
per partition. It can only commit all offsets read by all partitions it is 
connected to [(Reference 
1)|http://grokbase.com/t/kafka/users/144b80h269/consumerconnector-commitoffsets].
 The above suggestion of one consumer connector per partition has been proposed 
by Kafka experts in many threads [(Reference 
2)|http://mail-archives.apache.org/mod_mbox/kafka-users/201409.mbox/%3CCAHBV8WeYj8ce6G5J0k3a1hGgdNskGv3bsaP8JXSM=kwbnuj...@mail.gmail.com%3E].

The other option could be to move to the newer consumer API in Kafka (with 
0.9+) that (I think) provides better options for handling a per partition 
commit. However, the new consumer is still marked beta, so not really sure. Can 
check with some Kafka committers internally.

For now, I will try out the first approach and see. In the meantime, happy to 
hear feedback from others.

> Kafka messages in ATLAS_HOOK might be lost in HA mode at the instant of 
> failover.
> ---------------------------------------------------------------------------------
>
>                 Key: ATLAS-629
>                 URL: https://issues.apache.org/jira/browse/ATLAS-629
>             Project: Atlas
>          Issue Type: Bug
>    Affects Versions: 0.7-incubating
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>            Priority: Critical
>             Fix For: 0.7-incubating
>
>
> Write data to Kafka continuously from Hive hook - can do this by writing a 
> script that constantly creates tables. Bring down the Active instance with 
> kill -9. Ensure writes continue after passive becomes active. The expectation 
> is the number of tables created and the number of tables in Atlas match.
> In one test, wrote 180 tables and switched over 6 times from one instance to 
> another. Found that 1 table was lost of the lot. i.e. 179 tables were 
> created, and 1 did not get in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ATLAS-629) Kafka messages in ATLAS_HOOK might be lost in HA mode at the instant of failover.

Reply via email to