I will probably end up repeating the very same thing than the other
discussion.
Summarizing my suggestions:
* Commiters are not the sole developers. There is no reason for
commiters to take all these responsibilities on their shoulders. Also
developer != commiter.
* Easy IN, Easy OUT. If no one volunteers to maintain something, then
there is no reason to keep it since the community is not interested in
it anyway.
* Easy to get in means more contributions and more contributors. Also a
way to grow community and have contributors becoming full commiters. It
is more than likely they will notice things that can be improved
elsewhere and start being more active overall.
* Easy to get out means only the maintained stuff stays. Stuff would
most likely get kicked out before a feature release (ex: 1.5 vs 1.6).
Bug fix releases have no reason to kick out components since they are
unlikely to break in between bug fix releases (ex: 1.5.2 vs 1.5.3).
* Spreading sources and sinks is going to be quite hard on users. This
would means users would have to be developers themselves since they
would have to:
- Find the source/sink on some random repository which may or may
not be maintained. Pick one of the repository out of all the ones the
user has found
- Build it against their own version of Apache Flume (Apache, CDH,
PHD, HDP...)
- Resolve dependencies and build issues between their version of
Apache Flume and source/sink since the source/sink may or may not have
been maintained
- Qualify the integration between their version of Apache Flume and
source/sink
* Spreading sources and sinks is going to be quite hard on developers.
Why should I target Apache Flume when I can just target my version of
Flume (CDH, PHD, HDP) ?
* Spreading sources and sinks is going to be quite hard on integrators
such as Apache Bigtop. This would means working with as many people as
there are source/sinks. Each own with their own way of working and
schedules.
For the details, see inline.
On 12/16/2013 09:17 PM, Ashish wrote:
@Israel - IMHO JIRA is not a good use-case for these discussion.
The discussion can easily be tracked on flume.markmail.com and the link is
provided
On Tue, Dec 17, 2013 at 7:19 AM, Mike Percy <[email protected]> wrote:
I have created a JIRA Brainstorming task to track this.
Hmm, I think there is a risk of losing this discussion in the flood of JIRA
traffic due to email filters. So I'm going to respond here on this thread.
If you want to reference this thread in the future, you can use this URL:
http://markmail.org/message/7x7tewbxqw4ubjyp
My 2 cents below.
Do contrib components get released with Flume? Can they break
compatibility with older versions (what does this mean if they are not
getting released?) etc. How supported are these?
I think you hit the nail on the head here.
Recently I commented on the Storm sink indicating I would not help maintain
it therefore I would not help review it. Implicitly, I meant that someone
else should probably take responsibility for maintaining it to some extent.
So I think having at least one person who hopefully has the bandwidth to
understand and maintain a module should be a prerequisite for inclusion
into the main line of Flume (i.e. the bits included in a typical Flume
release).
Mike - You have echoed my thoughts.
We have the same situation in MINA (and in Netty as well). People write a
lot of codecs, but we can't get
all of them into the release. A Codec would be brought in only if one of
the Committers volunteers to maintain it.
This was my thought for Flume as well, if one of the committer volunteers
to maintain it, bring it in, else we have to find
a way to keep things easy for Users. How, not sure at the moment.
Why does accepting a patch mean that you are signing up for maintaining
it forever?
Commiters are not the main/only developers of a project.
Maybe I am wrong, but I see them as facilitators for the community. In
the sense that they help contributors contribute their patches and move
the community forward.
So limiting features to whatever the current set of commiter is willing
to maintain by themselves is not going to scale and will limit the
community. Also some community members may be interested in
contributions in which no commiter has experience in and therefore no
commiter is able/willing to maintain.
So why not just removing features or parts that are not maintained?
Being more aggressive in removing unmaintained parts would enable Apache
Flume to be more inclusive with regards to contributions.
If there is no one in the community willing to maintain a given part,
then there is no reason to keep it.
Generally, my concern with creating a contrib module is that, in my view,
it is where code goes to die, since why have a contrib module unless we
have no intention of maintaining that code? As an example, see the various
states of stability of code in Pig's piggybank. Is there good stuff in
there? Absolutely. Is everything in there relatively stable / usable, even
on a release? Nope. Those things may or may not work at any time.
Personally I'm not sure adding a contrib is a very good idea.
So let's assume, we have contrib, and if it gets release with Flume core,
we don't have a choice but to atleast ensure
it's compiling and tests are running. Sooner or later, one of the core jar
upgrade will force us to fix the code in contrib.
People can contribute patches, but this would still consume Dev cycles.
If contrib is source only release or not compatible with current release,
it better not be part of released.
This will add a lot of confusion for users.
Zookeeper does have contrib, not sure how they feel about it
Still, I am worried about impact it might have on future contributions
I tend to agree with the contrib modules. It would end up as a dumpster.
What about tagging modules as stable/beta/experimental instead?
And if they don't even build or pass tests by release time, then remove it.
As another dimension to this discussion, I think there is a limit to the
number of dependencies Flume can reasonably pull in and keep straight
without shading or classloading tricks, which themselves add another layer
of pain/difficulty to debugging.
This does not completely solve that probleme but is somewhat related:
what about moving all the current sources and sinks as plugins?
So the core remains lean with all its dependencies in lib/ and all the
sources and sinks specific libs end up in plugins.d/<plugin>/libext.
This would be more in the context of Apache Bigtop and packages, but
that would enable people to pick and choose their dependencies. For
instance doing a "yum install flume-ng-hdfs flume-ng-redis flume-ng-agent".
Right now I don't really care about the hdfs sink, but I end up having
to download a bunch of hdfs related packages that are not really needed.
https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Plugins
I think this page has a lot of value but maybe the problem with it is that
it is not accessible from the home page of http://flume.apache.org/
As a user, having Apache Flume able to speak multiple sources and sinks
is a big plus. Having to shop around for various sources/sinks is more
troublesome since I have to first find which flavor of a given sink is
being maintained today, deal with licenses, incompatibilities, mismatch
versions, upgrades, deployment, not fixed bugs and wondering if this is
even going to work at all.
Knowing a piece of code is in Apache Flume puts my mind at ease since
the license is clear, CLA cleared and it has been reviewed. There may be
some expectations regarding its support and quality, but it should be
fine as long as it is clearly stated and labeled (ex: tagging them with
different labels such as "supported", "experimental"). This also gives
more opportunities for bugs to be fixed and therefore having code better
maintained, due to the wider audience of Apache Flume in comparison to a
random small project on github.
Also as a user, I would have to be fairly technical to use a random
source/sink outside of Apache Flume. I would probably have to build it,
qualify it against my version of Apache Flume, and package it for
deployment. Whereas if it is in Apache Flume, it's either already in the
tarball or already in the package of my favorite Apache Flume distribution.
As a developer, Apache Flume is very flexible since I can pick and
choose most parts. But if I have to write my own source and/or my own
sink, I may be tempted to forego Apache Flume altogether and write the
rest myself for my specific use case.
But if I get to write a source for my use case, I don't have much
incentive to make it public or to maintain it with the current Apache
Flume version. I just need to ensure it works for my version of Apache
Flume. Everything else is just extra work.
Also in the context of being an employee, I would rather target my
source/sink to work with one of vendor supported version of Apache
Flume, which may be different from the latest Apache Flume. I would have
no incentive to go through the effort of testing it against Apache
Flume. If my source/sink was in Apache Flume, I would be more interested
in contributing to Apache Flume since I know the changes would trickle
down at some point and make my life easier.
As an Apache Bigtop contributor, having all these projects spread around
scares me. They will all depend against different versions of Apache
Flume, build in different ways, works in different ways and integrate in
their own way. Sending patches upstream will also be troublesome since
now we would have to talk to and work with a lot more people than just
Apache Flume folks. Each of these people having different schedules and
ways of working.
TL;DR:
1. In my view we should only accept new, big components in the main project
if they have a good chance of continuing to be maintained.
+1, if one of the Committers volunteers to maintain it
2. The more dependencies Flume has, the harder it is to upgrade any one
component and keep all the dependencies from conflicting (i.e. JAR hell).
We are almost there :)
3. Is there an alternative to contrib that would solve the problem of
giving new plugins a home if they cannot be included in the main project?
Is there a way we could make plugins that live on e.g. GitHub more visible?
4. If we add a contrib module, I agree that we need to be clear on why
contrib exists and what it means to have something live in there.
+1
Mike
Summarizing my suggestions
1. For a contribution to make it into Flume, one of committers had to step
in
2. Not very much convinced with contrib, but don't have solution otherwise
as well