Eric, Thanks for the analysis. A couple of comments:
Kafka recently added the end-to-end compression feature and we will be releasing it soon. Please see https://issues.apache.org/jira/browse/KAFKA-79for details. About the file rolling support, are you referring to Kafka log? Kafka logs are rolled based on a preconfigured size. Thanks, Jun On Thu, Sep 29, 2011 at 11:25 AM, Eric Hauser <ewhau...@gmail.com> wrote: > Jeremy, > > I've used both Flume and Kafka, and I can provide some info for comparison: > > Flume > - The current Flume release 0.9.4 has some pretty nasty bugs in it > (most have been fixed in trunk). > - Flume is a more complex to maintain operations-wise (IMO) than Kafka > since you have to setup masters and collectors (you don't necessarily > need collectors if you aren't writing to HDFS) > - Flume has a well defined pattern for doing what you want: > > http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/ > > Kafka > - If you need multiple Kafka partitions for the logs, you will want to > partition by host so the messages arrive in order for the same host > - You can use the same piped technique as Flume to publish to Kafka, > but you'll have to write a little code to publish and subscribe to the > stream > - Kafka does not provide any of the file rolling, compression, etc. > that Flume provides > - If you ever want to do anything more interesting with those log > files than just send them to one location, publishing them to Kafka > would allow you to add additional consumers later. Flume has a > concept of fanout sinks, but I don't care for the way it works. > > > > On Thu, Sep 29, 2011 at 1:48 PM, Jun Rao <jun...@gmail.com> wrote: > > Jeremy, > > > > Yes, Kafka will be a good fit for that. > > > > Thanks, > > > > Jun > > > > On Thu, Sep 29, 2011 at 10:12 AM, Jeremy Hanna > > <jeremy.hanna1...@gmail.com>wrote: > > > >> We have a number of web servers in ec2 and periodically we just blow > them > >> away and create new ones. That makes keeping logs problematic. We're > >> looking for a way to stream the logs from those various sources directly > to > >> a central log server - either just a single server or hdfs or something > like > >> that. > >> > >> My question is whether kafka is a good fit for that or should I be > looking > >> more along the lines of flume or scribe? > >> > >> Many thanks. > >> > >> Jeremy > > >