Hey Guys, Very cool. :) When we have something up, I'll try and follow up here to let you know.
In the mean time, you might have a look at Google's Dapper paper: http://research.google.com/pubs/pub36356.html And the Google Wide Profiling paper: http://research.google.com/pubs/pub36575.html Both are great resources for folks building things in this area. Cheers, Chris On 10/20/13 7:15 AM, "Philip Reynolds" <[email protected]> wrote: >Just to chime in, I'd be very interested in the monitoring blog post too. >We're doing a kafka implementation for a robust data pipeline. Initially, >samza does look interesting for monitoring use-cases. > >On Sun, Oct 20, 2013 at 2:53 PM, Garry Turkington < >[email protected]> wrote: > >> Hi Chris, >> >> Thanks for all this, makes sense. Be interested to hear where things go >> with the locality optimizations. I'm just looking at deploying our first >> Kafka cluster to change how we do data distribution and that's not >>going to >> initially be collocated with the Hadoop cluster. Samza's tight Kafka >> integration is one of the things that has drawn me to it so I'm looking >> forward (!) to seeing what sort of performance/latency I get from the >> remote/smaller Kafka setup. >> >> Looking forward to the blog post on the monitoring jobs written in >>Samza. >> We're in the earlier stages of a common service framework so have the >> luxury of building on the experiences of others who learned this stuff >>the >> hard way. :) >> >> Regards >> Garry >> >> -----Original Message----- >> From: Chris Riccomini [mailto:[email protected]] >> Sent: 18 October 2013 19:01 >> To: [email protected] >> Subject: Re: Special Bay Area HUG: Tajo and Samza >> >> Hey Gary, >> >> Thanks! >> >> Locality: A few things to note here. >> >> 1. We run one broker per host, as you suggest (18 nodes = 18 brokers). >> 2. Samza does not explicitly try to do any co-location right now. Any >> locality that we get is purely luck. >> 3. YARN allows you to make resource requests for a specific host/rack. >> This is the feature we would like to use to provide better locality. >> >> We haven't done any meaningful evaluation of the locality we're getting >> (or would get) right now, though. >> >> Operations: Yes, we have a pretty cool set of Samza jobs that Jakob >>wrote >> to do some metrics/monitoring stuff. He can probably talk more about it >> than I can. We're planning on putting up a blog post in the near future >> about it. >> >> More broadly, we have a pretty well defined service container at >>LinkedIn. >> These services are called via RPC. Every time an RPC request is made, >>the >> service logs out information about the request: who sent the request, >>what >> method was called, how long it took to process, etc etc. In addition, we >> also have all WARN/ERROR log events flowing through Kafka as well (via >> Kafka's Log4j appender). There is a brief mention of this in: >> >> http://sites.computer.org/debull/A12june/pipeline.pdf >> >> As you can imagine, there are a ton of things you can do with this >>data. :) >> >> Cheers, >> Chris >> >> On 10/18/13 4:44 AM, "Garry Turkington" >><[email protected]> >> wrote: >> >> >Hi Chris, >> > >> >Nice presentation -- 2 questions: >> > >> >1. I had wondered about the references to Kafka broker colocation I'd >> >seen around the place. So for example in the 18-node sized cluster you >> >mention you'd have 18 Kafka brokers running there, 1 per host? Do you >> >actually get any sort of data locality benefits from this, is there a >> >way to ensure that the Samza container on host x is processing the >> >partitions of each topic on the collocated Kafka broker? Or am I >>missing >> the intent? >> > >> >2. Interested at your mention of using something like Samza for >> >processing of monitoring and metric type data, it's something we've >> >been talking about internally. Anything been published on what you are >> >doing in that space? >> > >> >Thanks! >> >Garry >> > >> >-----Original Message----- >> >From: Chris Riccomini [mailto:[email protected]] >> >Sent: 17 October 2013 21:54 >> >To: [email protected] >> >Subject: Re: Special Bay Area HUG: Tajo and Samza >> > >> >Hey Guys, >> > >> >On a related note, my talk from the YARN meet up at LinkedIn is now >> >online: >> > >> > https://www.youtube.com/watch?v=7YBmUKjzg7c >> > >> >If you're not too familiar with Samza, this is a great place to start. >> > >> >Also, feedback welcome on presentation content, style, etc. >> > >> >Cheers, >> >Chris >> > >> >On 10/17/13 11:08 AM, "Jakob Homan" <[email protected]> wrote: >> > >> >>Hey everybody- >> >> Join us at LinkedIn Nov. 5 for a special HUG dedicated to two new >> >>awesome Incubator projects, Tajo, a low-latency SQL query engine atop >> >>YARN and Samza. >> >> >> >>http://www.meetup.com/hadoop/events/146077932/ >> >> >> >>-Jakob >> > >> > >> >----- >> >No virus found in this message. >> >Checked by AVG - www.avg.com >> >Version: 2013.0.3408 / Virus Database: 3222/6751 - Release Date: >> >10/15/13 >> >> >> ----- >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 2013.0.3408 / Virus Database: 3222/6751 - Release Date: >>10/15/13 >>
