1,2. We don't really provide guarantees outside the message boundary. You
are right that in the current implementation the nested message sets we use
for compression do actually fullfill this purpose. Our intention with these
was really to allow batch compression, and they don't really have any
advantage over putting multiple records in a single kafka message.
Likewise, our delivery guarantees are at the message boundary.

3. We would be interested in decoupling the offset tracking and also in
adding some facility to garbage collect unused groups (e.g. if no update to
a group for > 1 week, delete it). This feature doesn't exist though. If
anyone wants to add it, it would probably be pretty easy--just a background
task in the broker that periodically scanned the groups in zk.

-Jay

On Sat, Jul 7, 2012 at 10:18 AM, graham sanderson <gra...@vast.com> wrote:

> 1) I would like to guarantee that a group of messages are always delivered
> in their entirety together (because there is contextual information in
> messages which precede other messages). I'm a little confused by the use of
> the term "nested message sets" since I don't really see much in the code
> (though II don't really know Scala) - perhaps this refers to the fact that
> you can have a set of messages within a message set file on disk. Anyway, I
> was curious (and I'm using the Java api now, but may move to the Scala
> later) what I need to do to guarantee N messages are sent and delivered as
> a single message set; is a single ProducerData with a List of messages
> always sent as a single message set? does compression need to be turned on?
> how does this affect network limits etc. (i.e. does the entire message set
> have to fit). I'm also assuming that once I have my message set containing
> all my messages it will be discarded in its entirety.
>
> 2) Related to 1) from the consumer side, can I tell the boundaries of a
> message set (perhaps not required for me), but nevertheless I do want to
> make sure I receive the entire set in one go (again do I have to set
> network limits accordingly). The docs say that the entire message set is
> always delivered to the client when compressed, but I'm not sure if it can
> be subdivided if not compressed. Note I'm happy to stick with compression
> if required.
>
> 3) So I'm using the ZookeeperConsumerConnector, since I don't want to
> manage finding the brokers myself, however I was wondering if there are any
> plans to decouple the consumer offset tracking from the former. One of my
> use cases is that I'll have a lot of ad-hoc one off consumers that simply
> read a subset of data until they die - from looking at ConsoleConsumer,
> there is currently a hack to simply delete the zookeeper info after the
> fact to get around this.
>
> Thanks,
>
> Graham.

Reply via email to