[ https://issues.apache.org/jira/browse/KAFKA-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750549#comment-13750549 ]
Tejas Patil commented on KAFKA-1012: ------------------------------------ Re transactionality: In the current patch (with embedded producer), a commit request (aka produce request for offsets topic) gets written to 2 places: logs of offsets topic and offset manager backend storage. If there is an error in writing any offset message to the logs, then this would be indicated in the response of the produce request. Embedded producer would internally retry the request (with failed messages only) after checking the error status in response. Only those messages which could make it to the logs are passed to the 2nd part (offset manager backend storage). As the backend would be basically a hash table or Zk, it is assumed that the offset manager won't fail to write data to the backend. To sum up, there is no notion of transactions. Brokers would "greedily" try to commit as many messages they can, if some offset messages fail, embedded producer would re-send a request just for the failed ones. Re "per-topic max message size": Does Kafka support per-topic max message size ? : I could not find such config. What does "server impact" includes : volume of offsets data stored in logs or having large metadata in memory ? Log cleaner would dedupe the logs of this topic frequently and so the size of logs would be pruned from time to time. About holding this metadata in in-memory hash table, I think its a nice thing to have a cap on message size to prevent in-memory table consuming large memory. Would include the same in coming patch. It would be helpful even for next phase when we move off embedded producer and start using offset commit request. Re "partitioning the offset topic": Its recommended to have a #(partitions for offsets topic) >= #brokers so that all brokers get a somewhat similar[*] amount of traffic of offset commits. The replication factor of the offsets topic should be more than that of any normal kafka topic to achieve high availability of the offset information. [*]: There can be a imbalance in server load if some consumer groups have lot of consumers or if some consumers have shorter offset commit interval than the others. So we cannot get a guarantee about "equal" load across all brokers. We expect that the load to be "similar" across all brokers. Thanks for all your comments [~criccomini] !!! > Implement an Offset Manager and hook offset requests to it > ---------------------------------------------------------- > > Key: KAFKA-1012 > URL: https://issues.apache.org/jira/browse/KAFKA-1012 > Project: Kafka > Issue Type: Sub-task > Components: consumer > Reporter: Tejas Patil > Assignee: Tejas Patil > Priority: Minor > Attachments: KAFKA-1012.patch, KAFKA-1012-v2.patch > > > After KAFKA-657, we have a protocol for consumers to commit and fetch offsets > from brokers. Currently, consumers are not using this API and directly > talking with Zookeeper. > This Jira will involve following: > 1. Add a special topic in kafka for storing offsets > 2. Add an OffsetManager interface which would handle storing, accessing, > loading and maintaining consumer offsets > 3. Implement offset managers for both of these 2 choices : existing ZK based > storage or inbuilt storage for offsets. > 4. Leader brokers would now maintain an additional hash table of offsets for > the group-topic-partitions that they lead > 5. Consumers should now use the OffsetCommit and OffsetFetch API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira