Ari, We currently process only a single DataType. We process almost exclusively only apache log files and handle them with only one data type.
I believe we are more interested in case #1. We have lots of a single type of data coming in very quickly. But I also agree longer term with Jerome's comment that having it be pluggable like the Processor class is ideal. On May 10, 2010, at 5:40 PM, Ariel Rabkin wrote: > On Mon, May 10, 2010 at 4:39 PM, Ariel Rabkin <asrab...@gmail.com> wrote: >> Can you say a bit about where your bottleneck is? Is there one reduce >> that's taking a very long time? Can you check the logs and see which >> datatype that reducer is dealing with? There was some discussion of >> this on JIRA recently; consensus is that our current partitioner works >> well if you have a wide variety of datatypes, none of which is too >> big, and badly if you have one or two datatypes with lots of data in >> each. > > I forgot to add -- the JIRA you should follow is > https://issues.apache.org/jira/browse/CHUKWA-481 > > We'd love to get feedback on what a more sensible approach would be > for handling your use case. > > --Ari > > -- > Ari Rabkin asrab...@gmail.com > UC Berkeley Computer Science Department