[ https://issues.apache.org/jira/browse/ATLAS-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272262#comment-15272262 ]
Hemanth Yamijala commented on ATLAS-629: ---------------------------------------- Started looking at the approach to fix this problem. With Kafka's (old) high level consumer, we only have *atmost-once* delivery because the offsets read from the partitions are auto committed by default. So if a message is read and offset auto committed, but before the metadata ingest is completed, the server reboots, then this message could be lost for processing. To fix this issue, I am looking at *atleast-once* delivery semantics with Kafka, under the assumption that *message processing can be idempotent on the server*. Given we use transactions in Titan and also have create-or-update semantics, this may be mostly true - but not really sure. Will need to test. To move to atleast-once processing, the predominant approach people follow seems to be to: * disable auto commit * Create one ConsumerConnector per partition of a topic. The latter is because the old high level consumer does not provide for commit per partition. It can only commit all offsets read by all partitions it is connected to [(Reference 1)|http://grokbase.com/t/kafka/users/144b80h269/consumerconnector-commitoffsets]. The above suggestion of one consumer connector per partition has been proposed by Kafka experts in many threads [(Reference 2)|http://mail-archives.apache.org/mod_mbox/kafka-users/201409.mbox/%3CCAHBV8WeYj8ce6G5J0k3a1hGgdNskGv3bsaP8JXSM=kwbnuj...@mail.gmail.com%3E]. The other option could be to move to the newer consumer API in Kafka (with 0.9+) that (I think) provides better options for handling a per partition commit. However, the new consumer is still marked beta, so not really sure. Can check with some Kafka committers internally. For now, I will try out the first approach and see. In the meantime, happy to hear feedback from others. > Kafka messages in ATLAS_HOOK might be lost in HA mode at the instant of > failover. > --------------------------------------------------------------------------------- > > Key: ATLAS-629 > URL: https://issues.apache.org/jira/browse/ATLAS-629 > Project: Atlas > Issue Type: Bug > Affects Versions: 0.7-incubating > Reporter: Hemanth Yamijala > Assignee: Hemanth Yamijala > Priority: Critical > Fix For: 0.7-incubating > > > Write data to Kafka continuously from Hive hook - can do this by writing a > script that constantly creates tables. Bring down the Active instance with > kill -9. Ensure writes continue after passive becomes active. The expectation > is the number of tables created and the number of tables in Atlas match. > In one test, wrote 180 tables and switched over 6 times from one instance to > another. Found that 1 table was lost of the lot. i.e. 179 tables were > created, and 1 did not get in. -- This message was sent by Atlassian JIRA (v6.3.4#6332)