So I'll just have to create one then I guess if I want to do this. I was planning on doing this:
prod#1 -> kafka#1 -> consumer -> prod#2 -> kafka#2 central kafka-central will have long lasting messages. So in the consumer that pulls off the kafka#2 will filter messages, and then I can create an index that maps offset to messageId. Just wondering how fast random access to a kafka fill will be, like will it be as fast as a db lookup. it's a memory mapped file so it should be fast in theory but when the # of files grows things will degrade. On Wed, Jun 13, 2012 at 10:01 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > There is no scanning, we compute the message location from the offset and > begin fetching there. > > Sent from my iPhone > > On Jun 13, 2012, at 6:40 AM, S Ahmed <sahmed1...@gmail.com> wrote: > > > I was thinking of replicating messages to a central location, and having > a > > very long expire date on the messages (like say 1 year). > > > > My requirement would be able to not just stream messages, but access > > messages by key, similiar to a "SELECT * FROM TABLE WHERE id=123" > > > > From I understand, currently their is no index file that maps messages to > > their exact location in a file correct? i.e. kafka streams the messages, > > so it goes to a .kafka file, starts from the beginning and streams the > data > > to a consumer. If your offset happends to be in the middle of the file, > it > > will scan the file, start at the beginning of the message, figure out the > > length of the message, and then jump to the position of the next message > > until it finds the correct message offset, is this correct? > > > > i.e. I would have to create some sort of index that maps the offset to > the > > 'messageId' (where the messageId is stored in the body of the message > > itself). >