Kafka itself would not do real-time analysis, but you can easily set up a Kafka consumer or consumers and feed the logs into a stream processing system.
The use of markers in Kafka (to mark the last good place that was read) can be used for reliability. On Thu, Nov 3, 2011 at 2:04 PM, Mark <static.void....@gmail.com> wrote: > > We had one problem that would pop up out of nowhere... > > https://groups.google.com/a/**cloudera.org/group/flume-user/** > browse_thread/thread/**66c6aecec9d1869b/**a3110d1dfb9e1d0b?lnk=gst&q=** > static.void#a3110d1dfb9e1d0b<https://groups.google.com/a/cloudera.org/group/flume-user/browse_thread/thread/66c6aecec9d1869b/a3110d1dfb9e1d0b?lnk=gst&q=static.void#a3110d1dfb9e1d0b> > > Another serious issue was when agents started to produce massive amounts > of data. For example, the logs produced by 1 machine was maybe 1mb/minute > but when the agent was unable to communicate with any collectors for what > ever reason it would fill up with GB's of data sitting in one of flumes > subfolders (sent, sending, completed, etc). > > Any links on how to create some real time analysis using kafka? > > Thanks again > > > On 11/3/11 12:18 PM, Neha Narkhede wrote: > >> Mark, >> >> First and foremost we are currently using RSylog to aggregate our logs >>>> from our application servers. >>>> >>> This is similar to the legacy system we had at LinkedIn, now >> successfully replaced by Kafka. >> >> Although this strategy has been working for our bulk processing needs it >>>> doen'st help us much with realtime analysis, something we would really like >>>> to introduce. >>>> >>> Kafka is designed to efficiently feed both real time and offline data >> pipelines. Being a pub-sub messaging system, it fits the need for >> real-time applications well. Its high throughput nature and built-in >> consumer parallelism features make it a good fit for feeding large >> systems like Hadoop and data-warehouses. At LinkedIn, we use it for >> activity tracking as well as real time RPC log analysis. >> >> For more information, please visit our webpage - >> http://incubator.apache.org/**kafka/index.html<http://incubator.apache.org/kafka/index.html>. >> It has a detailed design >> writeup, and quickstart for you to try it out. >> >> We've tried Flume but that didn't work out too well. >>>> >>> I'm interested in knowing what roadblocks you hit while trying Flume >> out, for curiosity sake ? >> >> Thanks, >> Neha >> >> On Thu, Nov 3, 2011 at 11:58 AM, Mark<static.void....@gmail.com**> >> wrote: >> >>> Neha thanks for the response. >>> >>> I'll try and explain our use case. First and foremost we are currently >>> using >>> RSylog to aggregate our logs from our application servers. This is >>> accomplished using their TCP plugin which sends logs to a cluster of >>> logging >>> machines. At the end of the day we then import this into Hadoop. Although >>> this strategy has been working for our bulk processing needs it doen'st >>> help >>> us much with realtime analysis, something we would really like to >>> introduce. >>> We've tried Flume but that didn't work out too well. So now we are in the >>> process of looking into alternative technologies that can help us with >>> both >>> our bulk and realtime analysis needs. >>> >>> Does it sound like Kafka would be a nice fit for our use case? Are there >>> any >>> examples, documentation on realtime analysis with Kafka? >>> >>> Thanks. >>> >>> On 11/3/11 11:37 AM, Neha Narkhede wrote: >>> >>>> Mark, >>>> >>>> For activity on the mailing list, take a look at these metrics - >>>> http://mail-archives.apache.**org/mod_mbox/incubator-kafka-**dev/<http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/> >>>> http://mail-archives.apache.**org/mod_mbox/incubator-kafka-**users/<http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/> >>>> >>>> For activity of the committers and the development - >>>> >>>> https://issues.apache.org/**jira/browse/KAFKA#selectedTab=** >>>> com.atlassian.jira.plugin.**system.project%3Aissues-panel<https://issues.apache.org/jira/browse/KAFKA#selectedTab=com.atlassian.jira.plugin.system.project%3Aissues-panel> >>>> >>>> A full-fledged comparison can be quite lengthy. Would you mind >>>> describing your case ? We can discuss the available alternatives and >>>> how Kafka would fit in. >>>> >>>> Kafka has been deployed in production at LinkedIn for over a year and >>>> a half. I believe there are other smaller startups using it too, and >>>> more in the pipeline. >>>> >>>> Thanks, >>>> Neha >>>> >>>> >>>> On Thu, Nov 3, 2011 at 11:00 AM, Mark<static.void....@gmail.com**> >>>> wrote: >>>> >>>>> I was wondering what the current state of Kafka is. Is it gaining much >>>>> traction? How active is the project, commiters and mailing lists? Are >>>>> there >>>>> other more popular alternatives out there? Any comparasion would help. >>>>> >>>>> Thanks for any input. >>>>> >>>>> -- -- *Evan Chan* Senior Software Engineer | e...@ooyala.com | (650) 996-4600 www.ooyala.com | blog <http://www.ooyala.com/blog> | @ooyala<http://www.twitter.com/ooyala>