One more point to this thread. It's really hard to do partitioning in Flume. If you need partitioning but don't want to deal with a set of central brokers, and don't need persistence, you can check out the new Storm project (github.com/nathanmarz)
-Evan On Thu, Sep 29, 2011 at 11:38 AM, Anurag <anurag.pha...@gmail.com> wrote: > Eric/Jun, > Can you throw some light on how to handle apache log rotation? afaik, > even if we write custom code to tail a file, the file handle is lost > on rotation and might result in some loss of data. > > > On Thu, Sep 29, 2011 at 11:35 AM, Jeremy Hanna > <jeremy.hanna1...@gmail.com> wrote: > > Thanks a lot for the comparison Eric. Really good to hear a perspective > from a user of both. > > > > On Sep 29, 2011, at 1:25 PM, Eric Hauser wrote: > > > >> Jeremy, > >> > >> I've used both Flume and Kafka, and I can provide some info for > comparison: > >> > >> Flume > >> - The current Flume release 0.9.4 has some pretty nasty bugs in it > >> (most have been fixed in trunk). > >> - Flume is a more complex to maintain operations-wise (IMO) than Kafka > >> since you have to setup masters and collectors (you don't necessarily > >> need collectors if you aren't writing to HDFS) > >> - Flume has a well defined pattern for doing what you want: > >> > http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/ > >> > >> Kafka > >> - If you need multiple Kafka partitions for the logs, you will want to > >> partition by host so the messages arrive in order for the same host > >> - You can use the same piped technique as Flume to publish to Kafka, > >> but you'll have to write a little code to publish and subscribe to the > >> stream > >> - Kafka does not provide any of the file rolling, compression, etc. > >> that Flume provides > >> - If you ever want to do anything more interesting with those log > >> files than just send them to one location, publishing them to Kafka > >> would allow you to add additional consumers later. Flume has a > >> concept of fanout sinks, but I don't care for the way it works. > >> > >> > >> > >> On Thu, Sep 29, 2011 at 1:48 PM, Jun Rao <jun...@gmail.com> wrote: > >>> Jeremy, > >>> > >>> Yes, Kafka will be a good fit for that. > >>> > >>> Thanks, > >>> > >>> Jun > >>> > >>> On Thu, Sep 29, 2011 at 10:12 AM, Jeremy Hanna > >>> <jeremy.hanna1...@gmail.com>wrote: > >>> > >>>> We have a number of web servers in ec2 and periodically we just blow > them > >>>> away and create new ones. That makes keeping logs problematic. We're > >>>> looking for a way to stream the logs from those various sources > directly to > >>>> a central log server - either just a single server or hdfs or > something like > >>>> that. > >>>> > >>>> My question is whether kafka is a good fit for that or should I be > looking > >>>> more along the lines of flume or scribe? > >>>> > >>>> Many thanks. > >>>> > >>>> Jeremy > >>> > > > > > -- -- *Evan Chan* Senior Software Engineer | e...@ooyala.com | (650) 996-4600 www.ooyala.com | blog <http://www.ooyala.com/blog> | @ooyala<http://www.twitter.com/ooyala>