Hi,

I would like to partition messages so that consumers ideally receive a
fixed set of messages.
This was discussed in a previous thread, where a solution was to notify a
consumer of a partitioning change so that it could discard its previous
state.
http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201205.mbox/%3CCAM%2BbZhi8YOwnE3RsTu5R7dt7J5REWhuV3zaN2qB9Lk0xTjJvMw%40mail.gmail.com%3E

I have a scenario where state for a consumer is held on local disk for
performance reasons (essentially like a local cache).  If the consumer
machine fails I am happy enough for the local data to be discarded - there
is a remote copy but with a high latency cost.  In this scenario we want
messages to be sent to a particular consumer until there is an explicit
rebalance.  I know that this is not how rebalancing works when using the
kafka zookeeper based producer/consumer, but wanted to explore any other
possibilities of how this could be solved.

Both the producer and consumer would need to have control of the
partitioning scheme.  The producer needs to know the [broker:partition] to
write the message into.  The consumer needs to know the set of
[broker:partition] to read from.  I could not see anything in the code that
would allow me to provide partitioning that allows that sort of control.

If I use Producer with zookeeper, the Partitioner API allows me to
partition messages, but then kafka code in the Producer controls allocation
of the Partitioner result to a physical [broker:partition].  If I use
Producer with fixed brokers, messages are allocated to random partitions.
Equivalently on the consumer side, kafka code in ZookeeperConsumerConnector
controls the allocation of [broker:partition] to available consumers (if
using the high level consumer).


1. Is there anything in that I have missed in kafka that would allow
control of the partition that a message is sent to / received from when
using zookeeper ? (I could not see it from a scan of the code).
2. To achieve the desired functionality, I assume I have to use
SyncProducer/AsyncProducer for producing messages and SimpleConsumer for
consumer messages, and then I would have to write the code that handles
producers/brokers/consumers.  Is this correct or is there a better way?
3. Are SyncProducer, AsyncProducer, SimpleConsumer considered part of the
"public" API of kafka so that they do not change too frequently or
disappear?
4. Is this requirement of general enough nature that it is worth attempting
to create plugin functionality for kafka? (otherwise I will just write code
specific for my case)

Any suggestions / feedback would be appreciated.

Thanks,
Ross

Reply via email to