Kafka itself would not do real-time analysis, but you can easily set up a
Kafka consumer or consumers and feed the logs into a stream processing
system.

The use of markers in Kafka (to mark the last good place that was read) can
be used for reliability.

On Thu, Nov 3, 2011 at 2:04 PM, Mark <static.void....@gmail.com> wrote:

>
> We had one problem that would pop up out of nowhere...
>
> https://groups.google.com/a/**cloudera.org/group/flume-user/**
> browse_thread/thread/**66c6aecec9d1869b/**a3110d1dfb9e1d0b?lnk=gst&q=**
> static.void#a3110d1dfb9e1d0b<https://groups.google.com/a/cloudera.org/group/flume-user/browse_thread/thread/66c6aecec9d1869b/a3110d1dfb9e1d0b?lnk=gst&q=static.void#a3110d1dfb9e1d0b>
>
> Another serious issue was when agents started to produce massive amounts
> of data. For example, the logs produced by 1 machine was maybe 1mb/minute
> but when the agent was unable to communicate with any collectors for what
> ever reason it would fill up with GB's of data sitting in one of flumes
> subfolders (sent, sending, completed, etc).
>
> Any links on how to create some real time analysis using kafka?
>
> Thanks again
>
>
> On 11/3/11 12:18 PM, Neha Narkhede wrote:
>
>> Mark,
>>
>>  First and foremost we are currently using RSylog to aggregate our logs
>>>> from our application servers.
>>>>
>>> This is similar to the legacy system we had at LinkedIn, now
>> successfully replaced by Kafka.
>>
>>  Although this strategy has been working for our bulk processing needs it
>>>> doen'st help us much with realtime analysis, something we would really like
>>>> to introduce.
>>>>
>>> Kafka is designed to efficiently feed both real time and offline data
>> pipelines. Being a pub-sub messaging system, it fits the need for
>> real-time applications well. Its high throughput nature and built-in
>> consumer parallelism features make it a good fit for feeding large
>> systems like Hadoop and data-warehouses. At LinkedIn, we use it for
>> activity tracking as well as real time RPC log analysis.
>>
>> For more information, please visit our webpage -
>> http://incubator.apache.org/**kafka/index.html<http://incubator.apache.org/kafka/index.html>.
>> It has a detailed design
>> writeup, and quickstart for you to try it out.
>>
>>  We've tried Flume but that didn't work out too well.
>>>>
>>> I'm interested in knowing what roadblocks you hit while trying Flume
>> out, for curiosity sake ?
>>
>> Thanks,
>> Neha
>>
>> On Thu, Nov 3, 2011 at 11:58 AM, Mark<static.void....@gmail.com**>
>>  wrote:
>>
>>> Neha thanks for the response.
>>>
>>> I'll try and explain our use case. First and foremost we are currently
>>> using
>>> RSylog to aggregate our logs from our application servers. This is
>>> accomplished using their TCP plugin which sends logs to a cluster of
>>> logging
>>> machines. At the end of the day we then import this into Hadoop. Although
>>> this strategy has been working for our bulk processing needs it doen'st
>>> help
>>> us much with realtime analysis, something we would really like to
>>> introduce.
>>> We've tried Flume but that didn't work out too well. So now we are in the
>>> process of looking into alternative technologies that can help us with
>>> both
>>> our bulk and realtime analysis needs.
>>>
>>> Does it sound like Kafka would be a nice fit for our use case? Are there
>>> any
>>> examples, documentation on realtime analysis with Kafka?
>>>
>>> Thanks.
>>>
>>> On 11/3/11 11:37 AM, Neha Narkhede wrote:
>>>
>>>> Mark,
>>>>
>>>> For activity on the mailing list, take a look at these metrics -
>>>> http://mail-archives.apache.**org/mod_mbox/incubator-kafka-**dev/<http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/>
>>>> http://mail-archives.apache.**org/mod_mbox/incubator-kafka-**users/<http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/>
>>>>
>>>> For activity of the committers and the development -
>>>>
>>>> https://issues.apache.org/**jira/browse/KAFKA#selectedTab=**
>>>> com.atlassian.jira.plugin.**system.project%3Aissues-panel<https://issues.apache.org/jira/browse/KAFKA#selectedTab=com.atlassian.jira.plugin.system.project%3Aissues-panel>
>>>>
>>>> A full-fledged comparison can be quite lengthy. Would you mind
>>>> describing your case ? We can discuss the available alternatives and
>>>> how Kafka would fit in.
>>>>
>>>> Kafka has been deployed in production at LinkedIn for over a year and
>>>> a half. I believe there are other smaller startups using it too, and
>>>> more in the pipeline.
>>>>
>>>> Thanks,
>>>> Neha
>>>>
>>>>
>>>> On Thu, Nov 3, 2011 at 11:00 AM, Mark<static.void....@gmail.com**>
>>>>  wrote:
>>>>
>>>>> I was wondering what the current state of Kafka is. Is it gaining much
>>>>> traction? How active is the project, commiters and mailing lists? Are
>>>>> there
>>>>> other more popular alternatives out there? Any comparasion would help.
>>>>>
>>>>> Thanks for any input.
>>>>>
>>>>>


-- 
--
*Evan Chan*
Senior Software Engineer |
e...@ooyala.com | (650) 996-4600
www.ooyala.com | blog <http://www.ooyala.com/blog> |
@ooyala<http://www.twitter.com/ooyala>

Reply via email to