Hmm, I think I am still confused. Is the question how to set up kafka to consume from log files being produced on a particular server even though those log files may get rotated during consumption? Or are saying you want to get all the log files onto a central server somewhere and so the question is whether there is an off-the-shelf method for consuming messages and outputing them to log files in a rotating fashion? I think it is the later. I don't think we have something for that, though it is not much more than a for loop so easy to add. It would be good to upgrade the console consumer to output to a file with optional file rolling instead of to the console.
-Jay On Thu, Sep 29, 2011 at 12:38 PM, Eric Hauser <ewhau...@gmail.com> wrote: > Jun, > > I was referring to the logic that would be necessary for the consumer > of the topic to rotate the log files on the centralized log server. > With Flume you would handle this via configuration: > > collectorSink("file://var/logs/flume/webdata/%Y-%m-%d/%H00/", "web-") > > You would probably just use log4j or what not in your Kafka consumer > to handle this. > > On Thu, Sep 29, 2011 at 3:20 PM, Jun Rao <jun...@gmail.com> wrote: > > Eric, > > > > Thanks for the analysis. A couple of comments: > > > > Kafka recently added the end-to-end compression feature and we will be > > releasing it soon. Please see > > https://issues.apache.org/jira/browse/KAFKA-79for details. > > > > About the file rolling support, are you referring to Kafka log? Kafka > logs > > are rolled based on a preconfigured size. > > > > Thanks, > > > > Jun > > > > On Thu, Sep 29, 2011 at 11:25 AM, Eric Hauser <ewhau...@gmail.com> > wrote: > > > >> Jeremy, > >> > >> I've used both Flume and Kafka, and I can provide some info for > comparison: > >> > >> Flume > >> - The current Flume release 0.9.4 has some pretty nasty bugs in it > >> (most have been fixed in trunk). > >> - Flume is a more complex to maintain operations-wise (IMO) than Kafka > >> since you have to setup masters and collectors (you don't necessarily > >> need collectors if you aren't writing to HDFS) > >> - Flume has a well defined pattern for doing what you want: > >> > >> > http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/ > >> > >> Kafka > >> - If you need multiple Kafka partitions for the logs, you will want to > >> partition by host so the messages arrive in order for the same host > >> - You can use the same piped technique as Flume to publish to Kafka, > >> but you'll have to write a little code to publish and subscribe to the > >> stream > >> - Kafka does not provide any of the file rolling, compression, etc. > >> that Flume provides > >> - If you ever want to do anything more interesting with those log > >> files than just send them to one location, publishing them to Kafka > >> would allow you to add additional consumers later. Flume has a > >> concept of fanout sinks, but I don't care for the way it works. > >> > >> > >> > >> On Thu, Sep 29, 2011 at 1:48 PM, Jun Rao <jun...@gmail.com> wrote: > >> > Jeremy, > >> > > >> > Yes, Kafka will be a good fit for that. > >> > > >> > Thanks, > >> > > >> > Jun > >> > > >> > On Thu, Sep 29, 2011 at 10:12 AM, Jeremy Hanna > >> > <jeremy.hanna1...@gmail.com>wrote: > >> > > >> >> We have a number of web servers in ec2 and periodically we just blow > >> them > >> >> away and create new ones. That makes keeping logs problematic. > We're > >> >> looking for a way to stream the logs from those various sources > directly > >> to > >> >> a central log server - either just a single server or hdfs or > something > >> like > >> >> that. > >> >> > >> >> My question is whether kafka is a good fit for that or should I be > >> looking > >> >> more along the lines of flume or scribe? > >> >> > >> >> Many thanks. > >> >> > >> >> Jeremy > >> > > >> > > >