Flumeeeees:

Flume has evolved over the last few years and has come a long way. I
think, to hit the next bar of reliability, maintainability, and
adoption, some of the core bits need some refactoring / design
retrofit. To this end, I've started a "revolutionary branch[1]." I've
listed some of my rationale as to why I think this is a good thing in
the JIRA, but I'm happy to go into detail here.

My main motivation for this comes from working on Flume and supporting
it in my day job at Cloudera. That said, I do this as an individual,
and with my ASF hat firmly in place. My (short) rational:
* I think the code base is too complex and that this is a barrier to
greater developer adoption. The internals shouldn't be scary.
* Some of the invariants of Flume have varied and remnants gum up the
works. For instance, there was a time where it was assumed there
wouldn't be multiple logical nodes per physical node; the complexities
of the threading came later.
* A few advertised features do not work as we'd expect / like. I want
to make it simpler to add these features.
* A number of recent bugs have exposed some evolutionary
implementation that could use refactoring.
* Flume does too much. It should do a smaller number of things (that
people really need / use) and do them exceedingly well. It's become
clear that some features are more important to people than others.

The details:

* The branch is at
http://svn.apache.org/viewvc/incubator/flume/branches/flume-728/
* There is already a (significantly smaller) core of Flume and a
skeletal Flume node.
* The wiki page tracking my notes and the "project" is at
https://cwiki.apache.org/confluence/display/FLUME/Flume+NG
* The parent JIRA tracking the project is at
https://issues.apache.org/jira/browse/FLUME-728

The process / intent:
* I intend to move extremely fast on the flume-728 branch and then
request a series of strict reviews and call for a vote to merge to
trunk. I'm happy to take reviews in the interim.
* I'd love folks to get involved and have this become a group effort.
The reason I started was to have a baseline to speak from and show 1.
that's I'm serious (via code) and 2. what I think an implementation
could look like.
* I fully understand the community / PPMC may -1 the merge (but that
would make me sad, so why would you do that?). I also immediately
regretted using the "NG" designation; it's presumptuous and I
apologize. Going forward, I'll refer to it as flume-728.

Excited to hear feedback or questions. Thanks.

[1] jmhsieh pointed out an email from Long Ago(tm) that described this
situation well. I'm following that approach, in spirit.
http://incubator.apache.org/learn/rules-for-revolutionaries.html
-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Reply via email to