Hey Guys,

Very cool. :) When we have something up, I'll try and follow up here to
let you know.

In the mean time, you might have a look at Google's Dapper paper:

  http://research.google.com/pubs/pub36356.html


And the Google Wide Profiling paper:

  http://research.google.com/pubs/pub36575.html

Both are great resources for folks building things in this area.

Cheers,
Chris

On 10/20/13 7:15 AM, "Philip Reynolds" <[email protected]> wrote:

>Just to chime in, I'd be very interested in the monitoring blog post too.
>We're doing a kafka implementation for a robust data pipeline. Initially,
>samza does look interesting for monitoring use-cases.
>
>On Sun, Oct 20, 2013 at 2:53 PM, Garry Turkington <
>[email protected]> wrote:
>
>> Hi Chris,
>>
>> Thanks for all this, makes sense.  Be interested to hear where things go
>> with the locality optimizations. I'm just looking at deploying our first
>> Kafka cluster to change how we do data distribution and that's not
>>going to
>>  initially be collocated with the Hadoop cluster.  Samza's tight Kafka
>> integration is one of the things that has drawn me to it so I'm looking
>> forward (!) to seeing what sort of performance/latency I get from the
>> remote/smaller Kafka setup.
>>
>> Looking forward to the blog post on the monitoring jobs written in
>>Samza.
>>  We're in the earlier stages of a common service framework so have the
>> luxury of building on the experiences of others who learned this stuff
>>the
>> hard way. :)
>>
>> Regards
>> Garry
>>
>> -----Original Message-----
>> From: Chris Riccomini [mailto:[email protected]]
>> Sent: 18 October 2013 19:01
>> To: [email protected]
>> Subject: Re: Special Bay Area HUG: Tajo and Samza
>>
>> Hey Gary,
>>
>> Thanks!
>>
>> Locality: A few things to note here.
>>
>> 1. We run one broker per host, as you suggest (18 nodes = 18 brokers).
>> 2. Samza does not explicitly try to do any co-location right now. Any
>> locality that we get is purely luck.
>> 3. YARN allows you to make resource requests for a specific host/rack.
>> This is the feature we would like to use to provide better locality.
>>
>> We haven't done any meaningful evaluation of the locality we're getting
>> (or would get) right now, though.
>>
>> Operations: Yes, we have a pretty cool set of Samza jobs that Jakob
>>wrote
>> to do some metrics/monitoring stuff. He can probably talk more about it
>> than I can. We're planning on putting up a blog post in the near future
>> about it.
>>
>> More broadly, we have a pretty well defined service container at
>>LinkedIn.
>> These services are called via RPC. Every time an RPC request is made,
>>the
>> service logs out information about the request: who sent the request,
>>what
>> method was called, how long it took to process, etc etc. In addition, we
>> also have all WARN/ERROR log events flowing through Kafka as well (via
>> Kafka's Log4j appender). There is a brief mention of this in:
>>
>>   http://sites.computer.org/debull/A12june/pipeline.pdf
>>
>> As you can imagine, there are a ton of things you can do with this
>>data. :)
>>
>> Cheers,
>> Chris
>>
>> On 10/18/13 4:44 AM, "Garry Turkington"
>><[email protected]>
>> wrote:
>>
>> >Hi Chris,
>> >
>> >Nice presentation -- 2 questions:
>> >
>> >1. I had wondered about the references to Kafka broker colocation I'd
>> >seen around the place.  So for example in the 18-node sized cluster you
>> >mention you'd have 18 Kafka brokers running there, 1 per host?  Do you
>> >actually get any sort of data locality benefits from this, is there a
>> >way to ensure that the Samza container on host x is processing the
>> >partitions of each topic on the collocated Kafka broker?  Or am I
>>missing
>> the intent?
>> >
>> >2. Interested at your mention of using something like Samza for
>> >processing of monitoring and metric type data, it's something we've
>> >been talking about internally.  Anything been published on what you are
>> >doing in that space?
>> >
>> >Thanks!
>> >Garry
>> >
>> >-----Original Message-----
>> >From: Chris Riccomini [mailto:[email protected]]
>> >Sent: 17 October 2013 21:54
>> >To: [email protected]
>> >Subject: Re: Special Bay Area HUG: Tajo and Samza
>> >
>> >Hey Guys,
>> >
>> >On a related note, my talk from the YARN meet up at LinkedIn is now
>> >online:
>> >
>> >  https://www.youtube.com/watch?v=7YBmUKjzg7c
>> >
>> >If you're not too familiar with Samza, this is a great place to start.
>> >
>> >Also, feedback welcome on presentation content, style, etc.
>> >
>> >Cheers,
>> >Chris
>> >
>> >On 10/17/13 11:08 AM, "Jakob Homan" <[email protected]> wrote:
>> >
>> >>Hey everybody-
>> >>   Join us at LinkedIn Nov. 5 for a special HUG dedicated to two new
>> >>awesome Incubator projects, Tajo, a low-latency SQL query engine atop
>> >>YARN and Samza.
>> >>
>> >>http://www.meetup.com/hadoop/events/146077932/
>> >>
>> >>-Jakob
>> >
>> >
>> >-----
>> >No virus found in this message.
>> >Checked by AVG - www.avg.com
>> >Version: 2013.0.3408 / Virus Database: 3222/6751 - Release Date:
>> >10/15/13
>>
>>
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 2013.0.3408 / Virus Database: 3222/6751 - Release Date:
>>10/15/13
>>

Reply via email to