Re: Aggregating tomcat, log4j, other logs in realtime

Jay Kreps Sun, 02 Oct 2011 07:59:43 -0700

Hmm, I think I am still confused. Is the question how to set up kafka to
consume from log files being produced on a particular server even though
those log files may get rotated during consumption? Or are saying you want
to get all the log files onto a central server somewhere and so the question
is whether there is an off-the-shelf method for consuming messages and
outputing them to log files in a rotating fashion? I think it is the later.
I don't think we have something for that, though it is not much more than a
for loop so easy to add. It would be good to upgrade the console consumer to
output to a file with optional file rolling instead of to the console.


-Jay

On Thu, Sep 29, 2011 at 12:38 PM, Eric Hauser <ewhau...@gmail.com> wrote:

> Jun,
>
> I was referring to the logic that would be necessary for the consumer
> of the topic to rotate the log files on the centralized log server.
> With Flume you would handle this via configuration:
>
> collectorSink("file://var/logs/flume/webdata/%Y-%m-%d/%H00/", "web-")
>
> You would probably just use log4j or what not in your Kafka consumer
> to handle this.
>
> On Thu, Sep 29, 2011 at 3:20 PM, Jun Rao <jun...@gmail.com> wrote:
> > Eric,
> >
> > Thanks for the analysis. A couple of comments:
> >
> > Kafka recently added the end-to-end compression feature and we will be
> > releasing it soon. Please see
> > https://issues.apache.org/jira/browse/KAFKA-79for details.
> >
> > About the file rolling support, are you referring to Kafka log? Kafka
> logs
> > are rolled based on a preconfigured size.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Sep 29, 2011 at 11:25 AM, Eric Hauser <ewhau...@gmail.com>
> wrote:
> >
> >> Jeremy,
> >>
> >> I've used both Flume and Kafka, and I can provide some info for
> comparison:
> >>
> >> Flume
> >> - The current Flume release 0.9.4 has some pretty nasty bugs in it
> >> (most have been fixed in trunk).
> >> - Flume is a more complex to maintain operations-wise (IMO) than Kafka
> >> since you have to setup masters and collectors (you don't necessarily
> >> need collectors if you aren't writing to HDFS)
> >> - Flume has a well defined pattern for doing what you want:
> >>
> >>
> http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/
> >>
> >> Kafka
> >> - If you need multiple Kafka partitions for the logs, you will want to
> >> partition by host so the messages arrive in order for the same host
> >> - You can use the same piped technique as Flume to publish to Kafka,
> >> but you'll have to write a little code to publish and subscribe to the
> >> stream
> >> - Kafka does not provide any of the file rolling, compression, etc.
> >> that Flume provides
> >> - If you ever want to do anything more interesting with those log
> >> files than just send them to one location, publishing them to Kafka
> >> would allow you to add additional consumers later.  Flume has a
> >> concept of fanout sinks, but I don't care for the way it works.
> >>
> >>
> >>
> >> On Thu, Sep 29, 2011 at 1:48 PM, Jun Rao <jun...@gmail.com> wrote:
> >> > Jeremy,
> >> >
> >> > Yes, Kafka will be a good fit for that.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Thu, Sep 29, 2011 at 10:12 AM, Jeremy Hanna
> >> > <jeremy.hanna1...@gmail.com>wrote:
> >> >
> >> >> We have a number of web servers in ec2 and periodically we just blow
> >> them
> >> >> away and create new ones.  That makes keeping logs problematic.
>  We're
> >> >> looking for a way to stream the logs from those various sources
> directly
> >> to
> >> >> a central log server - either just a single server or hdfs or
> something
> >> like
> >> >> that.
> >> >>
> >> >> My question is whether kafka is a good fit for that or should I be
> >> looking
> >> >> more along the lines of flume or scribe?
> >> >>
> >> >> Many thanks.
> >> >>
> >> >> Jeremy
> >> >
> >>
> >
>

Re: Aggregating tomcat, log4j, other logs in realtime

Reply via email to