Marko, I agree with Sharad. Instead of using kafka for random message lookups, you could use it as the persistent message bus between the publishers of the messages and your indexing system. Using the low level consumer API (SimpleConsumer), you could set up your indexer processes to pull from the broker partitions for a topic. You would have to checkpoint your Kafka offsets to match the data indexed and flushed to disk, and re-fetch data from Kafka, if/when the indexer fails.
Thanks, Neha On Fri, Oct 21, 2011 at 9:47 AM, Sharad Agarwal <sharad.apa...@gmail.com> wrote: > kafka is more suited for sequential message reads. Not really meant for > random message lookups. > > Also using kafka as *long* term message store is not a good usecase. > > On Fri, Oct 21, 2011 at 9:32 PM, <ma...@modelcitizen.com> wrote: > >> I would like to use Kafka to process messages that need to be immutably >> stored for a N-days, and during that period the msgs need to be indexed, >> searched, as well as retrieval of msg data that is queried. >> >> >> >> One approach is to read messages from Kafka and store the messages in a >> secondary db for query and data retrieval. Once the messages are read and >> processed into the secondary db, then the messages can be discarded from >> the >> Kafka queue. >> >> >> >> Another approach is to read the messages, build an external index for >> searching that directly references the message data by Kafka-key in the >> Kafka queue itself. In this case the Kafka becomes the message store for >> the life of the message/data. >> >> >> >> The latter would be ideal for me if the performance of query-by-key and >> message data retrieval is very good. >> >> >> >> Is random query of message+data good for Kafka? Is this an appropriate >> usecase for Kafka? >> >> >> >> Thank you. >> >> >> >> Marko. >> >> . >> >> >> >> > > > -- > Thanks > Sharad Agarwal > Hadoop and Avro Committer > Technology Platforms, InMobi > *Disclaimer: Opinions expressed here are my own and do not represent past or > present employers.* >