We are processing apache log files.    The current scale is 70-80GB per 
day...but we'd like it to have a story for scaling up to move. Just checking my 
collector logs it appears the data rate is still ranges from 600KB-1.2 MB.    
This is all from one collector.  Does your setup use multiple collectors?  My 
thought is that multiple collectors could be used to scale out once we reach a 
data rate that caused issues for a single collector.

Any chance you know where that data rate is?

On May 10, 2010, at 5:37 PM, Ariel Rabkin wrote:

> That's how we use it at Berkeley, to process metrics from hundreds of
> machines; total data rate less than a megabyte per second, though.
> What scale of data are you looking at?
> 
> The intent of SocketTee was if you need some subset of the data now,
> while write-to-HDFS-and-process-with-Hadoop is still the default path.
> What sort of low-latency processing do you need?
> 
> --Ari
> 
> On Mon, May 10, 2010 at 4:28 PM, Corbin Hoenes <cor...@tynt.com> wrote:
>> Has anyone used the "Tee" in a larger scale deployment to try to get 
>> real-time/low latency data?  Interested in how feasible it would be to use 
>> it to pipe data into another system to handle these low latency requests and 
>> leave the long term analysis to hadoop.
>> 
>> 
> 
> 
> 
> -- 
> Ari Rabkin asrab...@gmail.com
> UC Berkeley Computer Science Department

Reply via email to