I guess I'll follow up:
My interest is in distributed systems and databases, and particularly where
those two intersect. I'm a Pig committer by night (and a little by day), and
tech lead of Twitter's data analysis infrastructure team by day (and a
little by night). My interest in Giraph is mostly motivated by the promise
of reusing our existing Hadoop infrastructure to perform a variety of
calculations much more efficiently. For now I'm mostly concerned with
getting things into a state where I won't be concerned with giving Giraph as
a tool to the data scientists on the team; that means integration with our
existing data sources, trimming the memory footprint, and finding and
eliminating a few edge cases that can prevent jobs from starting correctly.
Longer-term, I would like to work on shoring up fault-tolerance and
improving the RPC subsystem.
On Thu, Sep 15, 2011 at 10:26 PM, Jake Mannix <jake.man...@gmail.com> wrote:
> Thanks Avery,
> Greetings all. In the other Apache communities of which I'm familiar
> (Mahout and Lucene, in particular), it is customary for new committers to
> give a little background / bio / self-introduction, so I'll carry that over,
> in hopes that it is a fairly universal practice. :)
> I'm originally a physics nerd, turned mathematician, turned software
> engineer mostly working on search (I built large parts of
> engine, as well as
> this <http://twitter.com/#!/who_to_follow/search/jake%20mannix> one), and
> as such have spent a lot of time in the Apache
> Lucene<http://lucene.apache.org>community (both of the linked-to search
> engines are built on Lucene,
> naturally enough). Over the past few years, I've been working more trying
> to apply my IR and math skills to machine learning, and as such have been
> working on Apache Mahout <http://mahout.apache.org>, where I'm a committer
> and PMC member, primarily working on distributed matrix computations and in
> more specific: decompositions (e.g.
> and topic modeling (e.g
> As you might imagine, social graphs play a pretty important role in much
> of the work I've been in, so finding efficient ways to do monstrously large
> graph computations is what brought Apache Giraph to my attention. I hope to
> spend some of my time (both free and as part of my workday) helping make
> Giraph speedy and CPU+memory+network efficient, by whatever means I can
> think of, and to write up some fun graph applications to go in the
> "examples" area as well.
> In fact, finding ways of doing stuff which is a bit *outside* the normal
> thought of a BSP graph calculation is one of my motivations for using /
> working with / helping Giraph: I'd love to see how hard it is (and how
> efficient the result is!) to compute truncated matrix SVD's in Giraph, or do
> a big topic-model learning of an LDA model, or any of the various other
> sophisticated machine learning algorithms of which I sadly know very little
> (like really anything to do with gradient boosted decision trees, or
> restricted boltzmann machines, etc).
> Well that was a bit long, but I can be a bit chatty, but there you go.
> Looking forward to working with the rest of the community here, and
> building some great stuff! The codebase is pretty huge and impressive
> already, I'm honored to help out in whatever way I can.
> On Thu, Sep 15, 2011 at 9:05 PM, Avery Ching <ach...@apache.org> wrote:
>> As an early Apache Incubator project, we need to build and grow the Giraph
>> community of folks interested in large-scale graph processing. Both Jake and
>> Dmitriy have demonstrated exceptional passion and talent in working with
>> Giraph. The Giraph PPMC has had only positives to say about them in the
>> voting process. They have both graciously accepted the offered
>> responsibilities and we are pleased to announce that they are now Giraph
>> committers and PPMC members!
Dmitriy V Ryaboy