Re: [DISCUSS] Feature bloat and contrib module

Ashish Wed, 18 Dec 2013 08:51:13 -0800

I am with you for most discussion, but on two points, we differ a bit

- Easy In - Easy Out, would be a bit difficult to achieve. Easy in is easy,
but on what basis we move a component out. Someone not maintaining it or
someone not using it. Would be difficult for us to know. Probably there may
be easy way of doing this, BigTop deals with these cases more often than me.


- Committers are facilitators agree, but we do need to review the
contributions, else the whole process would break.

>From the discussion it seems that nobody seems to be inclined towards
contrib. How about giving components place under sources/sinks/interceptors
modules which are already present. Most of the times, only these would be
the things that would come in. These components maintain compatibility with
Flume's current version, get released alongside. Extra burden for core
Dev's for sure, but would save a lot of trouble for Users. Final call will
still be with Flume Dev's

@Bruno - Buddy Please do not stop work on Redis component. This is the last
thing we want. I am not a Committer here, but definitely willing to pitch
in for review and testing it a bit (would learn redis as well :) )

cheers
ashish



On Tue, Dec 17, 2013 at 1:04 PM, Bruno Mahé <[email protected]> wrote:

> I will probably end up repeating the very same thing than the other
> discussion.
>
>
>
> Summarizing my suggestions:
> * Commiters are not the sole developers. There is no reason for commiters
> to take all these responsibilities on their shoulders. Also developer !=
> commiter.
> * Easy IN, Easy OUT. If no one volunteers to maintain something, then
> there is no reason to keep it since the community is not interested in it
> anyway.
> * Easy to get in means more contributions and more contributors. Also a
> way to grow community and have contributors becoming full commiters. It is
> more than likely they will notice things that can be improved elsewhere and
> start being more active overall.
> * Easy to get out means only the maintained stuff stays. Stuff would most
> likely get kicked out before a feature release (ex: 1.5 vs 1.6). Bug fix
> releases have no reason to kick out components since they are unlikely to
> break in between bug fix releases (ex: 1.5.2 vs 1.5.3).
> * Spreading sources and sinks is going to be quite hard on users. This
> would means users would have to be developers themselves since they would
> have to:
>     - Find the source/sink on some random repository which may or may not
> be maintained. Pick one of the repository out of all the ones the user has
> found
>     - Build it against their own version of Apache Flume (Apache, CDH,
> PHD, HDP...)
>     - Resolve dependencies and build issues between their version of
> Apache Flume and source/sink since the source/sink may or may not have been
> maintained
>     - Qualify the integration between their version of Apache Flume and
> source/sink
> * Spreading sources and sinks is going to be quite hard on developers. Why
> should I target Apache Flume when I can just target my version of Flume
> (CDH, PHD, HDP) ?
> * Spreading sources and sinks is going to be quite hard on integrators
> such as Apache Bigtop. This would means working with as many people as
> there are source/sinks. Each own with their own way of working and
> schedules.
>
>
> For the details, see inline.
>
>
>
> On 12/16/2013 09:17 PM, Ashish wrote:
>
>> @Israel - IMHO JIRA is not a good use-case for these discussion.
>> The discussion can easily be tracked on flume.markmail.com and the link
>> is
>> provided
>>
>>
>> On Tue, Dec 17, 2013 at 7:19 AM, Mike Percy <[email protected]> wrote:
>>
>>  I have created a JIRA Brainstorming task to track this.
>>>>
>>>
>>> Hmm, I think there is a risk of losing this discussion in the flood of
>>> JIRA
>>> traffic due to email filters. So I'm going to respond here on this
>>> thread.
>>> If you want to reference this thread in the future, you can use this URL:
>>> http://markmail.org/message/7x7tewbxqw4ubjyp
>>>
>>> My 2 cents below.
>>>
>>>  Do contrib components get released with Flume? Can they break
>>>>
>>> compatibility with older versions (what does this mean if they are not
>>> getting released?) etc. How supported are these?
>>>
>>> I think you hit the nail on the head here.
>>>
>>> Recently I commented on the Storm sink indicating I would not help
>>> maintain
>>> it therefore I would not help review it. Implicitly, I meant that someone
>>> else should probably take responsibility for maintaining it to some
>>> extent.
>>> So I think having at least one person who hopefully has the bandwidth to
>>> understand and maintain a module should be a prerequisite for inclusion
>>> into the main line of Flume (i.e. the bits included in a typical Flume
>>> release).
>>>
>>>
>> Mike - You have echoed my thoughts.
>>
>> We have the same situation in MINA (and in Netty as well). People write a
>> lot of codecs, but we can't get
>> all of them into the release. A Codec would be brought in only if one of
>> the Committers volunteers to maintain it.
>>
>> This was my thought for Flume as well, if one of the committer volunteers
>> to maintain it, bring it in, else we have to find
>> a way to keep things easy for Users. How, not sure at the moment.
>>
>>
>>
> Why does accepting a patch mean that you are signing up for maintaining it
> forever?
>
> Commiters are not the main/only developers of a project.
> Maybe I am wrong, but I see them as facilitators for the community. In the
> sense that they help contributors contribute their patches and move the
> community forward.
> So limiting features to whatever the current set of commiter is willing to
> maintain by themselves is not going to scale and will limit the community.
> Also some community members may be interested in contributions in which no
> commiter has experience in and therefore no commiter is able/willing to
> maintain.
>
> So why not just removing features or parts that are not maintained?
> Being more aggressive in removing unmaintained parts would enable Apache
> Flume to be more inclusive with regards to contributions.
>
> If there is no one in the community willing to maintain a given part, then
> there is no reason to keep it.
>
>
>
>
>>
>>> Generally, my concern with creating a contrib module is that, in my view,
>>> it is where code goes to die, since why have a contrib module unless we
>>> have no intention of maintaining that code? As an example, see the
>>> various
>>> states of stability of code in Pig's piggybank. Is there good stuff in
>>> there? Absolutely. Is everything in there relatively stable / usable,
>>> even
>>> on a release? Nope. Those things may or may not work at any time.
>>> Personally I'm not sure adding a contrib is a very good idea.
>>>
>>>
>> So let's assume, we have contrib, and if it gets release with Flume core,
>> we don't have a choice but to atleast ensure
>> it's compiling and tests are running. Sooner or later, one of the core jar
>> upgrade will force us to fix the code in contrib.
>> People can contribute patches, but this would still consume Dev cycles.
>>
>> If contrib is source only release or not compatible with current release,
>> it better not be part of released.
>> This will add a lot of confusion for users.
>> Zookeeper does have contrib, not sure how they feel about it
>>
>> Still, I am worried about impact it might have on future contributions
>>
>>
>>
> I tend to agree with the contrib modules. It would end up as a dumpster.
> What about tagging modules as stable/beta/experimental instead?
>
> And if they don't even build or pass tests by release time, then remove it.
>
>
>
>>> As another dimension to this discussion, I think there is a limit to the
>>> number of dependencies Flume can reasonably pull in and keep straight
>>> without shading or classloading tricks, which themselves add another
>>> layer
>>> of pain/difficulty to debugging.
>>>
>>>
> This does not completely solve that probleme but is somewhat related: what
> about moving all the current sources and sinks as plugins?
> So the core remains lean with all its dependencies in lib/ and all the
> sources and sinks specific libs end up in plugins.d/<plugin>/libext.
>
> This would be more in the context of Apache Bigtop and packages, but that
> would enable people to pick and choose their dependencies. For instance
> doing a "yum install flume-ng-hdfs flume-ng-redis flume-ng-agent".
> Right now I don't really care about the hdfs sink, but I end up having to
> download a bunch of hdfs related packages that are not really needed.
>
>
>
>  https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Plugins
>>>>
>>>
>>> I think this page has a lot of value but maybe the problem with it is
>>> that
>>> it is not accessible from the home page of http://flume.apache.org/
>>>
>>>
> As a user, having Apache Flume able to speak multiple sources and sinks is
> a big plus. Having to shop around for various sources/sinks is more
> troublesome since I have to first find which flavor of a given sink is
> being maintained today, deal with licenses, incompatibilities, mismatch
> versions, upgrades, deployment, not fixed bugs and wondering if this is
> even going to work at all.
> Knowing a piece of code is in Apache Flume puts my mind at ease since the
> license is clear, CLA cleared and it has been reviewed. There may be some
> expectations regarding its support and quality, but it should be fine as
> long as it is clearly stated and labeled (ex: tagging them with different
> labels such as "supported", "experimental"). This also gives more
> opportunities for bugs to be fixed and therefore having code better
> maintained, due to the wider audience of Apache Flume in comparison to a
> random small project on github.
> Also as a user, I would have to be fairly technical to use a random
> source/sink outside of Apache Flume. I would probably have to build it,
> qualify it against my version of Apache Flume, and package it for
> deployment. Whereas if it is in Apache Flume, it's either already in the
> tarball or already in the package of my favorite Apache Flume distribution.
>
>
> As a developer, Apache Flume is very flexible since I can pick and choose
> most parts. But if I have to write my own source and/or my own sink, I may
> be tempted to forego Apache Flume altogether and write the rest myself for
> my specific use case.
> But if I get to write a source for my use case, I don't have much
> incentive to make it public or to maintain it with the current Apache Flume
> version. I just need to ensure it works for my version of Apache Flume.
> Everything else is just extra work.
> Also in the context of being an employee, I would rather target my
> source/sink to work with one of vendor supported version of Apache Flume,
> which may be different from the latest Apache Flume. I would have no
> incentive to go through the effort of testing it against Apache Flume. If
> my source/sink was in Apache Flume, I would be more interested in
> contributing to Apache Flume since I know the changes would trickle down at
> some point and make my life easier.
>
>
> As an Apache Bigtop contributor, having all these projects spread around
> scares me. They will all depend against different versions of Apache Flume,
> build in different ways, works in different ways and integrate in their own
> way. Sending patches upstream will also be troublesome since now we would
> have to talk to and work with a lot more people than just Apache Flume
> folks. Each of these people having different schedules and ways of working.
>
>
>
>
>
>
>
>  TL;DR:
>>> 1. In my view we should only accept new, big components in the main
>>> project
>>> if they have a good chance of continuing to be maintained.
>>>
>>>
>> +1, if one of the Committers volunteers to maintain it
>>
>>
>>  2. The more dependencies Flume has, the harder it is to upgrade any one
>>> component and keep all the dependencies from conflicting (i.e. JAR hell).
>>>
>>>
>> We are almost there :)
>>
>>
>>  3. Is there an alternative to contrib that would solve the problem of
>>> giving new plugins a home if they cannot be included in the main project?
>>> Is there a way we could make plugins that live on e.g. GitHub more
>>> visible?
>>> 4. If we add a contrib module, I agree that we need to be clear on why
>>> contrib exists and what it means to have something live in there.
>>>
>>>
>> +1
>>
>>
>>
>>> Mike
>>>
>>>
>>
>> Summarizing my suggestions
>> 1. For a contribution to make it into Flume, one of committers had to step
>> in
>> 2. Not very much convinced with contrib, but don't have solution otherwise
>> as well
>>
>>
>>
>>
>>
>
>
>
>
>


-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: [DISCUSS] Feature bloat and contrib module

Reply via email to