Hey, So I'm currently running one mapper per-partition. I guess I didn't state this, but my code is based on the hadoop-consumer in the contrib/ project. I was really wondering whether anyone has tried multiple consumers per partition.
On Mon, Sep 17, 2012 at 6:54 PM, Min Yu <mini...@gmail.com> wrote: > If you want run each Mapper job per partition, > > https://github.com/miniway/kafka-hadoop-consumer > > might help. > > Thanks > Min > > 2012. 9. 18. 오전 6:51 Matthew Rathbone <matt...@foursquare.com> 작성: > > > Hey guys, > > > > I've been using the hadoop consumer a whole lot this week, but I'm seeing > > pretty poor throughput with one task per partition. I figured a good > > solution would be to have multiple tasks per partition, so I wanted to > run > > my assumptions by you all first: > > > > This should enable the broker to round robin events between tasks right? > > > > When I record the high-watermark at the end of the mapreduce job there > will > > be N entries for each partition (one per task), so is it correct to just > > take max(watermarks)? > > -- my assumption is that as they're getting events round-robin, > everything > > should have been consumed up to the highest watermark found. Does this > hold > > true? > > > > Is anyone else using the consumer like this? > > > > > > > > -- > > Matthew Rathbone > > Foursquare | Software Engineer | Server Engineering Team > > matt...@foursquare.com | @rathboma <http://twitter.com/rathboma> | > > 4sq<http://foursquare.com/rathboma> > -- Matthew Rathbone Foursquare | Software Engineer | Server Engineering Team matt...@foursquare.com | @rathboma <http://twitter.com/rathboma> | 4sq<http://foursquare.com/rathboma>