Sounds good to me
On 12/20/12 5:04 PM, Jay Kreps wrote:
Err, to clarify, I meant punt on persisting the metadata not punt on
persisting the offset. Basically that field would be in the protocol but
would be unused in this phase.
-Jay
On Thu, Dec 20, 2012 at 2:03 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
I actually recommend we just punt on implementing persistence in zk
entirely, otherwise we have to have an upgrade path to grandfather over
existing zk data to the new format. Let's just add it in the API and only
actually store it out when we redo the backend. We can handle the size
limit then too.
-Jay
On Thu, Dec 20, 2012 at 1:30 PM, David Arthur <mum...@gmail.com> wrote:
No particular objection, though in order to support atomic writes of
(offset, metadata), we will need to define a protocol for the ZooKeeper
payloads. Something like:
OffsetPayload => Offset [Metadata]
Metadata => length prefixed string
should suffice. Otherwise we would have to rely on the multi-write
mechanism to keep parallel znodes in sync (I generally don't like things
like this).
+1 for limiting the size (1kb sounds reasonable)
On 12/20/12 4:03 PM, Jay Kreps wrote:
Okay I did some assessment of use cases we have which aren't using the
default offset storage API and came up with one generalization. I would
like to propose--add a generic metadata field to the offset api on a
per-partition basis. So that would leave us with the following:
OffsetCommitRequest => ConsumerGroup [TopicName [Partition Offset
Metadata]]
OffsetFetchResponse => [TopicName [Partition Offset Metadata ErrorCode]]
Metadata => string
If you want to store a reference to any associated state (say an HDFS
file
name) so that if the consumption fails over the new consumer can start up
with the same state, this would be a place to store that. It would not be
intended to support large stuff (we could enforce a 1k limit or
something,
just something small or a reference on where to find the state (say a
file
name).
Objections?
-Jay
On Mon, Dec 17, 2012 at 10:45 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
Hey Guys,
David has made a bunch of progress on the offset commit api
implementation.
Since this is a public API it would be good to do as much thinking
up-front as possible to minimize future iterations.
It would be great if folks could do the following:
1. Read the wiki here:
https://cwiki.apache.org/**confluence/display/KAFKA/**Offset+Management<https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management>
2. Check out the code David wrote here:
https://issues.apache.org/**jira/browse/KAFKA-657<https://issues.apache.org/jira/browse/KAFKA-657>
In particular our hope is that this API can act as the first step in
scaling the way we store offsets (ZK is not really very appropriate for
this). This of course requires having some plan in mind for offset
storage.
I have written (and then after getting some initial feedback,
rewritten) a
section in the above wiki on how this might work.
If no one says anything I will be taking a slightly modified patch that
adds this functionality on trunk as soon as David gets in a few minor
tweaks.
-Jay