Anurag, I wouldn't tail the log files, but instead make use of Apache's facilities to pipe the logs to another program:
http://httpd.apache.org/docs/2.2/mod/core.html#errorlog http://httpd.apache.org/docs/2.0/programs/rotatelogs.html On Thu, Sep 29, 2011 at 2:38 PM, Anurag <anurag.pha...@gmail.com> wrote: > Eric/Jun, > Can you throw some light on how to handle apache log rotation? afaik, > even if we write custom code to tail a file, the file handle is lost > on rotation and might result in some loss of data. > > > On Thu, Sep 29, 2011 at 11:35 AM, Jeremy Hanna > <jeremy.hanna1...@gmail.com> wrote: >> Thanks a lot for the comparison Eric. Really good to hear a perspective >> from a user of both. >> >> On Sep 29, 2011, at 1:25 PM, Eric Hauser wrote: >> >>> Jeremy, >>> >>> I've used both Flume and Kafka, and I can provide some info for comparison: >>> >>> Flume >>> - The current Flume release 0.9.4 has some pretty nasty bugs in it >>> (most have been fixed in trunk). >>> - Flume is a more complex to maintain operations-wise (IMO) than Kafka >>> since you have to setup masters and collectors (you don't necessarily >>> need collectors if you aren't writing to HDFS) >>> - Flume has a well defined pattern for doing what you want: >>> http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/ >>> >>> Kafka >>> - If you need multiple Kafka partitions for the logs, you will want to >>> partition by host so the messages arrive in order for the same host >>> - You can use the same piped technique as Flume to publish to Kafka, >>> but you'll have to write a little code to publish and subscribe to the >>> stream >>> - Kafka does not provide any of the file rolling, compression, etc. >>> that Flume provides >>> - If you ever want to do anything more interesting with those log >>> files than just send them to one location, publishing them to Kafka >>> would allow you to add additional consumers later. Flume has a >>> concept of fanout sinks, but I don't care for the way it works. >>> >>> >>> >>> On Thu, Sep 29, 2011 at 1:48 PM, Jun Rao <jun...@gmail.com> wrote: >>>> Jeremy, >>>> >>>> Yes, Kafka will be a good fit for that. >>>> >>>> Thanks, >>>> >>>> Jun >>>> >>>> On Thu, Sep 29, 2011 at 10:12 AM, Jeremy Hanna >>>> <jeremy.hanna1...@gmail.com>wrote: >>>> >>>>> We have a number of web servers in ec2 and periodically we just blow them >>>>> away and create new ones. That makes keeping logs problematic. We're >>>>> looking for a way to stream the logs from those various sources directly >>>>> to >>>>> a central log server - either just a single server or hdfs or something >>>>> like >>>>> that. >>>>> >>>>> My question is whether kafka is a good fit for that or should I be looking >>>>> more along the lines of flume or scribe? >>>>> >>>>> Many thanks. >>>>> >>>>> Jeremy >>>> >> >> >