Hi, I would like to partition messages so that consumers ideally receive a fixed set of messages. This was discussed in a previous thread, where a solution was to notify a consumer of a partitioning change so that it could discard its previous state. http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201205.mbox/%3CCAM%2BbZhi8YOwnE3RsTu5R7dt7J5REWhuV3zaN2qB9Lk0xTjJvMw%40mail.gmail.com%3E
I have a scenario where state for a consumer is held on local disk for performance reasons (essentially like a local cache). If the consumer machine fails I am happy enough for the local data to be discarded - there is a remote copy but with a high latency cost. In this scenario we want messages to be sent to a particular consumer until there is an explicit rebalance. I know that this is not how rebalancing works when using the kafka zookeeper based producer/consumer, but wanted to explore any other possibilities of how this could be solved. Both the producer and consumer would need to have control of the partitioning scheme. The producer needs to know the [broker:partition] to write the message into. The consumer needs to know the set of [broker:partition] to read from. I could not see anything in the code that would allow me to provide partitioning that allows that sort of control. If I use Producer with zookeeper, the Partitioner API allows me to partition messages, but then kafka code in the Producer controls allocation of the Partitioner result to a physical [broker:partition]. If I use Producer with fixed brokers, messages are allocated to random partitions. Equivalently on the consumer side, kafka code in ZookeeperConsumerConnector controls the allocation of [broker:partition] to available consumers (if using the high level consumer). 1. Is there anything in that I have missed in kafka that would allow control of the partition that a message is sent to / received from when using zookeeper ? (I could not see it from a scan of the code). 2. To achieve the desired functionality, I assume I have to use SyncProducer/AsyncProducer for producing messages and SimpleConsumer for consumer messages, and then I would have to write the code that handles producers/brokers/consumers. Is this correct or is there a better way? 3. Are SyncProducer, AsyncProducer, SimpleConsumer considered part of the "public" API of kafka so that they do not change too frequently or disappear? 4. Is this requirement of general enough nature that it is worth attempting to create plugin functionality for kafka? (otherwise I will just write code specific for my case) Any suggestions / feedback would be appreciated. Thanks, Ross