Eric,

Thanks for the analysis. A couple of comments:

Kafka recently added the end-to-end compression feature and we will be
releasing it soon. Please see
https://issues.apache.org/jira/browse/KAFKA-79for details.

About the file rolling support, are you referring to Kafka log? Kafka logs
are rolled based on a preconfigured size.

Thanks,

Jun

On Thu, Sep 29, 2011 at 11:25 AM, Eric Hauser <ewhau...@gmail.com> wrote:

> Jeremy,
>
> I've used both Flume and Kafka, and I can provide some info for comparison:
>
> Flume
> - The current Flume release 0.9.4 has some pretty nasty bugs in it
> (most have been fixed in trunk).
> - Flume is a more complex to maintain operations-wise (IMO) than Kafka
> since you have to setup masters and collectors (you don't necessarily
> need collectors if you aren't writing to HDFS)
> - Flume has a well defined pattern for doing what you want:
>
> http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/
>
> Kafka
> - If you need multiple Kafka partitions for the logs, you will want to
> partition by host so the messages arrive in order for the same host
> - You can use the same piped technique as Flume to publish to Kafka,
> but you'll have to write a little code to publish and subscribe to the
> stream
> - Kafka does not provide any of the file rolling, compression, etc.
> that Flume provides
> - If you ever want to do anything more interesting with those log
> files than just send them to one location, publishing them to Kafka
> would allow you to add additional consumers later.  Flume has a
> concept of fanout sinks, but I don't care for the way it works.
>
>
>
> On Thu, Sep 29, 2011 at 1:48 PM, Jun Rao <jun...@gmail.com> wrote:
> > Jeremy,
> >
> > Yes, Kafka will be a good fit for that.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Sep 29, 2011 at 10:12 AM, Jeremy Hanna
> > <jeremy.hanna1...@gmail.com>wrote:
> >
> >> We have a number of web servers in ec2 and periodically we just blow
> them
> >> away and create new ones.  That makes keeping logs problematic.  We're
> >> looking for a way to stream the logs from those various sources directly
> to
> >> a central log server - either just a single server or hdfs or something
> like
> >> that.
> >>
> >> My question is whether kafka is a good fit for that or should I be
> looking
> >> more along the lines of flume or scribe?
> >>
> >> Many thanks.
> >>
> >> Jeremy
> >
>

Reply via email to