Hey Gary, Thanks!
Locality: A few things to note here. 1. We run one broker per host, as you suggest (18 nodes = 18 brokers). 2. Samza does not explicitly try to do any co-location right now. Any locality that we get is purely luck. 3. YARN allows you to make resource requests for a specific host/rack. This is the feature we would like to use to provide better locality. We haven't done any meaningful evaluation of the locality we're getting (or would get) right now, though. Operations: Yes, we have a pretty cool set of Samza jobs that Jakob wrote to do some metrics/monitoring stuff. He can probably talk more about it than I can. We're planning on putting up a blog post in the near future about it. More broadly, we have a pretty well defined service container at LinkedIn. These services are called via RPC. Every time an RPC request is made, the service logs out information about the request: who sent the request, what method was called, how long it took to process, etc etc. In addition, we also have all WARN/ERROR log events flowing through Kafka as well (via Kafka's Log4j appender). There is a brief mention of this in: http://sites.computer.org/debull/A12june/pipeline.pdf As you can imagine, there are a ton of things you can do with this data. :) Cheers, Chris On 10/18/13 4:44 AM, "Garry Turkington" <[email protected]> wrote: >Hi Chris, > >Nice presentation -- 2 questions: > >1. I had wondered about the references to Kafka broker colocation I'd >seen around the place. So for example in the 18-node sized cluster you >mention you'd have 18 Kafka brokers running there, 1 per host? Do you >actually get any sort of data locality benefits from this, is there a way >to ensure that the Samza container on host x is processing the partitions >of each topic on the collocated Kafka broker? Or am I missing the intent? > >2. Interested at your mention of using something like Samza for >processing of monitoring and metric type data, it's something we've been >talking about internally. Anything been published on what you are doing >in that space? > >Thanks! >Garry > >-----Original Message----- >From: Chris Riccomini [mailto:[email protected]] >Sent: 17 October 2013 21:54 >To: [email protected] >Subject: Re: Special Bay Area HUG: Tajo and Samza > >Hey Guys, > >On a related note, my talk from the YARN meet up at LinkedIn is now >online: > > https://www.youtube.com/watch?v=7YBmUKjzg7c > >If you're not too familiar with Samza, this is a great place to start. > >Also, feedback welcome on presentation content, style, etc. > >Cheers, >Chris > >On 10/17/13 11:08 AM, "Jakob Homan" <[email protected]> wrote: > >>Hey everybody- >> Join us at LinkedIn Nov. 5 for a special HUG dedicated to two new >>awesome Incubator projects, Tajo, a low-latency SQL query engine atop >>YARN and Samza. >> >>http://www.meetup.com/hadoop/events/146077932/ >> >>-Jakob > > >----- >No virus found in this message. >Checked by AVG - www.avg.com >Version: 2013.0.3408 / Virus Database: 3222/6751 - Release Date: 10/15/13
