When you need your data streams to be incrementally loaded into hadoop for offline batch processing and/or ad-hoc querying - some things cannot (or are expensive to) be computed in real-time. So you have a hadoop job that consumes kafka stream, potentially formats the data and saves into hdfs.
On 30 October 2012 23:28, Hussein Baghdadi <hubaghd...@hotmail.com> wrote: > > > > > Hi,Kafka comes with a support for Hadoop. I'm not sure what does this > mean.Kafka is a publish-subscribe messaging system. What are some of the > typical usage of Kafka-support for Hadoop producers and consumers?Well, > producers are easy to digest. MapReduce job emitting data to Kafka.But what > about Hadoop consumers?Hadoop is a batching system, not a continuous > running system (as Storm or Dempsy). Say Kafka gets some data, what will > happen?Thanks for help and time. > -- Michal Haris Software Engineer www.visualdna.com | t: +44 (0) 207 734 7033