Hey guys,

I've been using the hadoop consumer a whole lot this week, but I'm seeing
pretty poor throughput with one task per partition. I figured a good
solution would be to have multiple tasks per partition, so I wanted to run
my assumptions by you all first:

This should enable the broker to round robin events between tasks right?

When I record the high-watermark at the end of the mapreduce job there will
be N entries for each partition (one per task), so is it correct to just
take max(watermarks)?
-- my assumption is that as they're getting events round-robin, everything
should have been consumed up to the highest watermark found. Does this hold
true?

Is anyone else using the consumer like this?



-- 
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
matt...@foursquare.com | @rathboma <http://twitter.com/rathboma> |
4sq<http://foursquare.com/rathboma>

Reply via email to