Re: [DISCUSS] Feature bloat and contrib module

Bruno Mahé Mon, 16 Dec 2013 23:34:59 -0800

I will probably end up repeating the very same thing than the otherdiscussion.



Summarizing my suggestions:

* Commiters are not the sole developers. There is no reason forcommiters to take all these responsibilities on their shoulders. Alsodeveloper != commiter.* Easy IN, Easy OUT. If no one volunteers to maintain something, thenthere is no reason to keep it since the community is not interested init anyway.* Easy to get in means more contributions and more contributors. Also away to grow community and have contributors becoming full commiters. Itis more than likely they will notice things that can be improvedelsewhere and start being more active overall.* Easy to get out means only the maintained stuff stays. Stuff wouldmost likely get kicked out before a feature release (ex: 1.5 vs 1.6).Bug fix releases have no reason to kick out components since they areunlikely to break in between bug fix releases (ex: 1.5.2 vs 1.5.3).* Spreading sources and sinks is going to be quite hard on users. Thiswould means users would have to be developers themselves since theywould have to:- Find the source/sink on some random repository which may or maynot be maintained. Pick one of the repository out of all the ones theuser has found- Build it against their own version of Apache Flume (Apache, CDH,PHD, HDP...)- Resolve dependencies and build issues between their version ofApache Flume and source/sink since the source/sink may or may not havebeen maintained- Qualify the integration between their version of Apache Flume andsource/sink* Spreading sources and sinks is going to be quite hard on developers.Why should I target Apache Flume when I can just target my version ofFlume (CDH, PHD, HDP) ?* Spreading sources and sinks is going to be quite hard on integratorssuch as Apache Bigtop. This would means working with as many people asthere are source/sinks. Each own with their own way of working andschedules.



For the details, see inline.


On 12/16/2013 09:17 PM, Ashish wrote:

@Israel - IMHO JIRA is not a good use-case for these discussion.
The discussion can easily be tracked on flume.markmail.com and the link is
provided


On Tue, Dec 17, 2013 at 7:19 AM, Mike Percy <[email protected]> wrote:

I have created a JIRA Brainstorming task to track this.


Hmm, I think there is a risk of losing this discussion in the flood of JIRA
traffic due to email filters. So I'm going to respond here on this thread.
If you want to reference this thread in the future, you can use this URL:
http://markmail.org/message/7x7tewbxqw4ubjyp

My 2 cents below.

Do contrib components get released with Flume? Can they break

compatibility with older versions (what does this mean if they are not
getting released?) etc. How supported are these?

I think you hit the nail on the head here.

Recently I commented on the Storm sink indicating I would not help maintain
it therefore I would not help review it. Implicitly, I meant that someone
else should probably take responsibility for maintaining it to some extent.
So I think having at least one person who hopefully has the bandwidth to
understand and maintain a module should be a prerequisite for inclusion
into the main line of Flume (i.e. the bits included in a typical Flume
release).


Mike - You have echoed my thoughts.

We have the same situation in MINA (and in Netty as well). People write a
lot of codecs, but we can't get
all of them into the release. A Codec would be brought in only if one of
the Committers volunteers to maintain it.

This was my thought for Flume as well, if one of the committer volunteers
to maintain it, bring it in, else we have to find
a way to keep things easy for Users. How, not sure at the moment.

Why does accepting a patch mean that you are signing up for maintainingit forever?


Commiters are not the main/only developers of a project.

Maybe I am wrong, but I see them as facilitators for the community. Inthe sense that they help contributors contribute their patches and movethe community forward.So limiting features to whatever the current set of commiter is willingto maintain by themselves is not going to scale and will limit thecommunity. Also some community members may be interested incontributions in which no commiter has experience in and therefore nocommiter is able/willing to maintain.


So why not just removing features or parts that are not maintained?

Being more aggressive in removing unmaintained parts would enable ApacheFlume to be more inclusive with regards to contributions.

If there is no one in the community willing to maintain a given part,then there is no reason to keep it.


Generally, my concern with creating a contrib module is that, in my view,
it is where code goes to die, since why have a contrib module unless we
have no intention of maintaining that code? As an example, see the various
states of stability of code in Pig's piggybank. Is there good stuff in
there? Absolutely. Is everything in there relatively stable / usable, even
on a release? Nope. Those things may or may not work at any time.
Personally I'm not sure adding a contrib is a very good idea.


So let's assume, we have contrib, and if it gets release with Flume core,
we don't have a choice but to atleast ensure
it's compiling and tests are running. Sooner or later, one of the core jar
upgrade will force us to fix the code in contrib.
People can contribute patches, but this would still consume Dev cycles.

If contrib is source only release or not compatible with current release,
it better not be part of released.
This will add a lot of confusion for users.
Zookeeper does have contrib, not sure how they feel about it

Still, I am worried about impact it might have on future contributions


I tend to agree with the contrib modules. It would end up as a dumpster.
What about tagging modules as stable/beta/experimental instead?

And if they don't even build or pass tests by release time, then remove it.


As another dimension to this discussion, I think there is a limit to the
number of dependencies Flume can reasonably pull in and keep straight
without shading or classloading tricks, which themselves add another layer
of pain/difficulty to debugging.

This does not completely solve that probleme but is somewhat related:what about moving all the current sources and sinks as plugins?So the core remains lean with all its dependencies in lib/ and all thesources and sinks specific libs end up in plugins.d/<plugin>/libext.

This would be more in the context of Apache Bigtop and packages, butthat would enable people to pick and choose their dependencies. Forinstance doing a "yum install flume-ng-hdfs flume-ng-redis flume-ng-agent".Right now I don't really care about the hdfs sink, but I end up havingto download a bunch of hdfs related packages that are not really needed.

https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Plugins


I think this page has a lot of value but maybe the problem with it is that
it is not accessible from the home page of http://flume.apache.org/

As a user, having Apache Flume able to speak multiple sources and sinksis a big plus. Having to shop around for various sources/sinks is moretroublesome since I have to first find which flavor of a given sink isbeing maintained today, deal with licenses, incompatibilities, mismatchversions, upgrades, deployment, not fixed bugs and wondering if this iseven going to work at all.Knowing a piece of code is in Apache Flume puts my mind at ease sincethe license is clear, CLA cleared and it has been reviewed. There may besome expectations regarding its support and quality, but it should befine as long as it is clearly stated and labeled (ex: tagging them withdifferent labels such as "supported", "experimental"). This also givesmore opportunities for bugs to be fixed and therefore having code bettermaintained, due to the wider audience of Apache Flume in comparison to arandom small project on github.Also as a user, I would have to be fairly technical to use a randomsource/sink outside of Apache Flume. I would probably have to build it,qualify it against my version of Apache Flume, and package it fordeployment. Whereas if it is in Apache Flume, it's either already in thetarball or already in the package of my favorite Apache Flume distribution.

As a developer, Apache Flume is very flexible since I can pick andchoose most parts. But if I have to write my own source and/or my ownsink, I may be tempted to forego Apache Flume altogether and write therest myself for my specific use case.But if I get to write a source for my use case, I don't have muchincentive to make it public or to maintain it with the current ApacheFlume version. I just need to ensure it works for my version of ApacheFlume. Everything else is just extra work.Also in the context of being an employee, I would rather target mysource/sink to work with one of vendor supported version of ApacheFlume, which may be different from the latest Apache Flume. I would haveno incentive to go through the effort of testing it against ApacheFlume. If my source/sink was in Apache Flume, I would be more interestedin contributing to Apache Flume since I know the changes would trickledown at some point and make my life easier.

As an Apache Bigtop contributor, having all these projects spread aroundscares me. They will all depend against different versions of ApacheFlume, build in different ways, works in different ways and integrate intheir own way. Sending patches upstream will also be troublesome sincenow we would have to talk to and work with a lot more people than justApache Flume folks. Each of these people having different schedules andways of working.

TL;DR:
1. In my view we should only accept new, big components in the main project
if they have a good chance of continuing to be maintained.


+1, if one of the Committers volunteers to maintain it

2. The more dependencies Flume has, the harder it is to upgrade any one
component and keep all the dependencies from conflicting (i.e. JAR hell).


We are almost there :)

3. Is there an alternative to contrib that would solve the problem of
giving new plugins a home if they cannot be included in the main project?
Is there a way we could make plugins that live on e.g. GitHub more visible?
4. If we add a contrib module, I agree that we need to be clear on why
contrib exists and what it means to have something live in there.

+1


Mike



Summarizing my suggestions
1. For a contribution to make it into Flume, one of committers had to step
in
2. Not very much convinced with contrib, but don't have solution otherwise
as well

Re: [DISCUSS] Feature bloat and contrib module

Reply via email to