We are processing apache log files. The current scale is 70-80GB per day...but we'd like it to have a story for scaling up to move. Just checking my collector logs it appears the data rate is still ranges from 600KB-1.2 MB. This is all from one collector. Does your setup use multiple collectors? My thought is that multiple collectors could be used to scale out once we reach a data rate that caused issues for a single collector.
Any chance you know where that data rate is? On May 10, 2010, at 5:37 PM, Ariel Rabkin wrote: > That's how we use it at Berkeley, to process metrics from hundreds of > machines; total data rate less than a megabyte per second, though. > What scale of data are you looking at? > > The intent of SocketTee was if you need some subset of the data now, > while write-to-HDFS-and-process-with-Hadoop is still the default path. > What sort of low-latency processing do you need? > > --Ari > > On Mon, May 10, 2010 at 4:28 PM, Corbin Hoenes <cor...@tynt.com> wrote: >> Has anyone used the "Tee" in a larger scale deployment to try to get >> real-time/low latency data? Interested in how feasible it would be to use >> it to pipe data into another system to handle these low latency requests and >> leave the long term analysis to hadoop. >> >> > > > > -- > Ari Rabkin asrab...@gmail.com > UC Berkeley Computer Science Department