Ari,

We currently process only a single DataType.  We process almost exclusively 
only apache log files and handle them with only one data type. 

I believe we are more interested in case #1.  We have lots of a single type of 
data coming in very quickly.
But I also agree longer term with Jerome's comment that having it be pluggable 
like the Processor class is ideal.  

On May 10, 2010, at 5:40 PM, Ariel Rabkin wrote:

> On Mon, May 10, 2010 at 4:39 PM, Ariel Rabkin <asrab...@gmail.com> wrote:
>> Can you say a bit about where your bottleneck is?  Is there one reduce
>> that's taking a very long time? Can you check the logs and see which
>> datatype that reducer is dealing with?  There was some discussion of
>> this on JIRA recently; consensus is that our current partitioner works
>> well if you have a wide variety of datatypes, none of which is too
>> big, and badly if you have one or two datatypes with lots of data in
>> each.
> 
> I forgot to add -- the JIRA you should follow is
> https://issues.apache.org/jira/browse/CHUKWA-481
> 
> We'd love to get feedback on what a more sensible approach would be
> for handling your use case.
> 
> --Ari
> 
> -- 
> Ari Rabkin asrab...@gmail.com
> UC Berkeley Computer Science Department

Reply via email to