I guess I'll follow up: My interest is in distributed systems and databases, and particularly where those two intersect. I'm a Pig committer by night (and a little by day), and tech lead of Twitter's data analysis infrastructure team by day (and a little by night). My interest in Giraph is mostly motivated by the promise of reusing our existing Hadoop infrastructure to perform a variety of calculations much more efficiently. For now I'm mostly concerned with getting things into a state where I won't be concerned with giving Giraph as a tool to the data scientists on the team; that means integration with our existing data sources, trimming the memory footprint, and finding and eliminating a few edge cases that can prevent jobs from starting correctly. Longer-term, I would like to work on shoring up fault-tolerance and improving the RPC subsystem.
-Dmitriy On Thu, Sep 15, 2011 at 10:26 PM, Jake Mannix <[email protected]> wrote: > Thanks Avery, > > Greetings all. In the other Apache communities of which I'm familiar > (Mahout and Lucene, in particular), it is customary for new committers to > give a little background / bio / self-introduction, so I'll carry that over, > in hopes that it is a fairly universal practice. :) > > I'm originally a physics nerd, turned mathematician, turned software > engineer mostly working on search (I built large parts of > this<http://www.linkedin.com/search/fpsearch?type=people&keywords=jake+mannix>search > engine, as well as > this <http://twitter.com/#!/who_to_follow/search/jake%20mannix> one), and > as such have spent a lot of time in the Apache > Lucene<http://lucene.apache.org>community (both of the linked-to search > engines are built on Lucene, > naturally enough). Over the past few years, I've been working more trying > to apply my IR and math skills to machine learning, and as such have been > working on Apache Mahout <http://mahout.apache.org>, where I'm a committer > and PMC member, primarily working on distributed matrix computations and in > more specific: decompositions (e.g. > SVD<http://en.wikipedia.org/wiki/Singular_value_decomposition>) > and topic modeling (e.g > LDA<http://en.wikipedia.org/wiki/Latent_Dirichlet_Allocation>). > > > As you might imagine, social graphs play a pretty important role in much > of the work I've been in, so finding efficient ways to do monstrously large > graph computations is what brought Apache Giraph to my attention. I hope to > spend some of my time (both free and as part of my workday) helping make > Giraph speedy and CPU+memory+network efficient, by whatever means I can > think of, and to write up some fun graph applications to go in the > "examples" area as well. > > In fact, finding ways of doing stuff which is a bit *outside* the normal > thought of a BSP graph calculation is one of my motivations for using / > working with / helping Giraph: I'd love to see how hard it is (and how > efficient the result is!) to compute truncated matrix SVD's in Giraph, or do > a big topic-model learning of an LDA model, or any of the various other > sophisticated machine learning algorithms of which I sadly know very little > (like really anything to do with gradient boosted decision trees, or > restricted boltzmann machines, etc). > > Well that was a bit long, but I can be a bit chatty, but there you go. > Looking forward to working with the rest of the community here, and > building some great stuff! The codebase is pretty huge and impressive > already, I'm honored to help out in whatever way I can. > > Hi! > > -jake > > On Thu, Sep 15, 2011 at 9:05 PM, Avery Ching <[email protected]> wrote: > >> As an early Apache Incubator project, we need to build and grow the Giraph >> community of folks interested in large-scale graph processing. Both Jake and >> Dmitriy have demonstrated exceptional passion and talent in working with >> Giraph. The Giraph PPMC has had only positives to say about them in the >> voting process. They have both graciously accepted the offered >> responsibilities and we are pleased to announce that they are now Giraph >> committers and PPMC members! >> >> Thanks, >> >> Avery >> > > -- Dmitriy V Ryaboy Twitter Analytics http://twitter.com/squarecog
