Indeed Hadoop is not the ideal platform for stream processing, but there are 
plenty of use cases for Kakfa + Hadoop. I use it to consolidate log data from 
many different systems into HDFS. I have N systems using the log4j appender 
producing to a Kafka broker, and then in my Hadoop cluster I run a simple job 
that consumes that data and writes out an HDFS file. This, in effect, is what 
other log aggregators like Flume do - however, we have Kafka in our stack for 
other pub/sub stuff so it made sense to use it for log aggregation as well. 

To answer your question about consuming in Hadoop, the RecordReader will just 
continue to return records until the queue is exhausted. If you could manage to 
produce data faster than Hadoop was reading it out (very unlikely), the Hadoop 
job would run forever (or a least for quite a while). I believe you end up with 
one RecordReader per Kafka partition, so allocating more partitions would 
increase your throughput to Hadoop (at least until you saturate the network 
between the Kafka brokers and Hadoop)

Hope this helps
-David

On Oct 30, 2012, at 8:40 PM, Michal Haris wrote:

> When you need your data streams to be incrementally loaded into hadoop for
> offline batch processing and/or ad-hoc querying - some things cannot (or
> are expensive to) be computed in real-time. So you have a hadoop job that
> consumes kafka stream, potentially formats the data and saves into hdfs.
> 
> On 30 October 2012 23:28, Hussein Baghdadi <hubaghd...@hotmail.com> wrote:
> 
>> 
>> 
>> 
>> 
>> Hi,Kafka comes with a support for Hadoop. I'm not sure what does this
>> mean.Kafka is a publish-subscribe messaging system. What are some of the
>> typical usage of Kafka-support for Hadoop producers and consumers?Well,
>> producers are easy to digest. MapReduce job emitting data to Kafka.But what
>> about Hadoop consumers?Hadoop is a batching system, not a continuous
>> running system (as Storm or Dempsy). Say Kafka gets some data, what will
>> happen?Thanks for help and time.
>> 
> 
> 
> 
> 
> -- 
> Michal Haris
> Software Engineer
> 
> www.visualdna.com | t: +44 (0) 207 734 7033

Reply via email to