Anurag,

I wouldn't tail the log files, but instead make use of Apache's
facilities to pipe the logs to another program:

http://httpd.apache.org/docs/2.2/mod/core.html#errorlog
http://httpd.apache.org/docs/2.0/programs/rotatelogs.html


On Thu, Sep 29, 2011 at 2:38 PM, Anurag <anurag.pha...@gmail.com> wrote:
> Eric/Jun,
> Can you throw some light on how to handle apache log rotation? afaik,
> even if we write custom code to tail a file, the file handle is lost
> on rotation and might result in some loss of data.
>
>
> On Thu, Sep 29, 2011 at 11:35 AM, Jeremy Hanna
> <jeremy.hanna1...@gmail.com> wrote:
>> Thanks a lot for the comparison Eric.  Really good to hear a perspective 
>> from a user of both.
>>
>> On Sep 29, 2011, at 1:25 PM, Eric Hauser wrote:
>>
>>> Jeremy,
>>>
>>> I've used both Flume and Kafka, and I can provide some info for comparison:
>>>
>>> Flume
>>> - The current Flume release 0.9.4 has some pretty nasty bugs in it
>>> (most have been fixed in trunk).
>>> - Flume is a more complex to maintain operations-wise (IMO) than Kafka
>>> since you have to setup masters and collectors (you don't necessarily
>>> need collectors if you aren't writing to HDFS)
>>> - Flume has a well defined pattern for doing what you want:
>>> http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/
>>>
>>> Kafka
>>> - If you need multiple Kafka partitions for the logs, you will want to
>>> partition by host so the messages arrive in order for the same host
>>> - You can use the same piped technique as Flume to publish to Kafka,
>>> but you'll have to write a little code to publish and subscribe to the
>>> stream
>>> - Kafka does not provide any of the file rolling, compression, etc.
>>> that Flume provides
>>> - If you ever want to do anything more interesting with those log
>>> files than just send them to one location, publishing them to Kafka
>>> would allow you to add additional consumers later.  Flume has a
>>> concept of fanout sinks, but I don't care for the way it works.
>>>
>>>
>>>
>>> On Thu, Sep 29, 2011 at 1:48 PM, Jun Rao <jun...@gmail.com> wrote:
>>>> Jeremy,
>>>>
>>>> Yes, Kafka will be a good fit for that.
>>>>
>>>> Thanks,
>>>>
>>>> Jun
>>>>
>>>> On Thu, Sep 29, 2011 at 10:12 AM, Jeremy Hanna
>>>> <jeremy.hanna1...@gmail.com>wrote:
>>>>
>>>>> We have a number of web servers in ec2 and periodically we just blow them
>>>>> away and create new ones.  That makes keeping logs problematic.  We're
>>>>> looking for a way to stream the logs from those various sources directly 
>>>>> to
>>>>> a central log server - either just a single server or hdfs or something 
>>>>> like
>>>>> that.
>>>>>
>>>>> My question is whether kafka is a good fit for that or should I be looking
>>>>> more along the lines of flume or scribe?
>>>>>
>>>>> Many thanks.
>>>>>
>>>>> Jeremy
>>>>
>>
>>
>

Reply via email to