Can you say a bit about where your bottleneck is?  Is there one reduce
that's taking a very long time? Can you check the logs and see which
datatype that reducer is dealing with?  There was some discussion of
this on JIRA recently; consensus is that our current partitioner works
well if you have a wide variety of datatypes, none of which is too
big, and badly if you have one or two datatypes with lots of data in
each.

On Mon, May 10, 2010 at 3:07 PM, Corbin Hoenes <cor...@tynt.com> wrote:
> Is it possible to tune the time or size interval on demux to lower the amount 
> of time it takes to get demuxed data into the hadoop cluster?
> (Or some other way?)   Currently there is about a 20-30 minute lag on our 
> setup.  Wondering also if this a wise thing to even try--maybe some side 
> effects?
>
>
>
>



-- 
Ari Rabkin asrab...@gmail.com
UC Berkeley Computer Science Department

Reply via email to