Re: [DISCUSS] Feature bloat and contrib module

Gabriel Commeau Mon, 23 Dec 2013 09:10:05 -0800

Hi,

IMHO, contrib modules seem better for the following reasons:
1. Keep the core as thin as possible. I like the idea of a pluggeable Flume
where the user adds the components needed, and only these. I imagine that
realistically, most users only use a handful of components, and therefore
don't need the whole library of every existing (or supported) sink and
source on their localhost. We could make the process of adding/removing
components easier, so that it becomes trivial for the user to
download/install/activate them.
2. License considerations. I can envision cases where one would want to
integrate Flume with another system that uses a license that's not
compatible with Apache's. So whether a contributor needs or wants to use a
different license, this contribution cannot currently be added to Flume. I'm
not an expert on licenses, but I wonder if it would be possible to include
these contributions using a contrib module.
3. Easiest way in. "Getting in" becomes trivial and open to all. Seems to me
like the best way to grow the project.
4. Community-based out. With a contrib project, we actually don't really
need to move contributions out. The community, if able to vote or report
usage naturally manages which contributions are used and which aren't.
5. Competition and maintenance. As software engineers, there are always
tradeoffs we need to make. Imagine a component that could have its
performances increased at the cost of, for example, compatibility with some
other systems. Why would this optimization have to conflict with Apache's
main component? Couldn't both live side-by-side, and let the user choose the
one that better fit his/her specific context and requirements?
So to answer the original discussion questions: I'd argue that contrib
modules would benefit Flume, that they should be released on their own
schedule, supported independently, and be compatible with whatever version
of Flume the authors wish.
I like cathedrals, and I tend to design my applications like that. But in
this case, I believe a little bit of bazaar would be best.
I hope this helps.


Gabriel

From:  Bruno Mahé <[email protected]>
Reply-To:  <[email protected]>
Date:  Saturday, December 21, 2013 4:29 PM
To:  <[email protected]>
Subject:  Re: [DISCUSS] Feature bloat and contrib module

See inline.

On 12/20/2013 04:01 PM, Mike Percy wrote:
>  On Mon, Dec 16, 2013 at 11:34 PM, Bruno Mahé <[email protected]> wrote:
>> 
>>  Summarizing my suggestions:
>>  * Commiters are not the sole developers. There is no reason for commiters
>>  to take all these responsibilities on their shoulders. Also developer !=
>>  commiter.
>>  * Easy IN, Easy OUT. If no one volunteers to maintain something, then
>>  there is no reason to keep it since the community is not interested in it
>>  anyway.
>>  * Easy to get in means more contributions and more contributors. Also a
>>  way to grow community and have contributors becoming full commiters. It is
>>  more than likely they will notice things that can be improved elsewhere and
>>  start being more active overall.
>>  * Easy to get out means only the maintained stuff stays. Stuff would most
>>  likely get kicked out before a feature release (ex: 1.5 vs 1.6). Bug fix
>>  releases have no reason to kick out components since they are unlikely to
>>  break in between bug fix releases (ex: 1.5.2 vs 1.5.3).
>>  * Spreading sources and sinks is going to be quite hard on users. This
>>  would means users would have to be developers themselves since they would
>>  have to:
>>       - Find the source/sink on some random repository which may or may not
>>  be maintained. Pick one of the repository out of all the ones the user has
>>  found
>>       - Build it against their own version of Apache Flume (Apache, CDH,
>>  PHD, HDP...)
>>       - Resolve dependencies and build issues between their version of
>>  Apache Flume and source/sink since the source/sink may or may not have been
>>  maintained
>>       - Qualify the integration between their version of Apache Flume and
>>  source/sink
>>  * Spreading sources and sinks is going to be quite hard on developers. Why
>>  should I target Apache Flume when I can just target my version of Flume
>>  (CDH, PHD, HDP) ?
>>  * Spreading sources and sinks is going to be quite hard on integrators
>>  such as Apache Bigtop. This would means working with as many people as
>>  there are source/sinks. Each own with their own way of working and
>>  schedules.
>> 
> 
>  Hey Bruno, great to hear from you on this list!
> 

Thanks!

>  Good points, and in principle, I mostly agree with what you are saying, but
>  I have concerns about some of the proposed approaches. Specifically:
> 
>  So why not just removing features or parts that are not maintained?
>>  Being more aggressive in removing unmaintained parts would enable Apache
>>  Flume to be more inclusive with regards to contributions.
>> 
> 
>  Removing stuff breaks back-compat and it is hard to know who is using a
>  component. If just one person is using something, is it worth it to keep
>  something? Where do we draw the line? That said, I am not against removing
>  stuff that made it into a release (after marking it @Deprecated for a
>  release) if we have consensus among committers that it needs to go.
> 

First of all, it is very hard to quantify users. And also they tend to
be silent if everything goes well.
Also I don't see the issue if just one person is using something. If
such component is maintained and does not create burden, why removing
it? Keeping it would be easy and make everyone happy.

The way I would see components being removed would rather be based on
their maintenance cost. For instance, is it blocking a release? Or have
all its tests failed for the past few weeks and no one care?

Regarding compatibility between versions, I would say:
* You don't have to remove the component as soon as you create the
ticket to remove it. This way you give enough opportunity for some
people to step up and take over before kicking it out.

* Manage expectations by adding labels to the components. When a
component is introduced, label it as "experimental", then a few release
later "beta" and then a few releases later "stable". Note that some
other criteria can be added to the labeling but this gives the
opportunity to announce that experimental components may not survive the
next release. This way you can ensure that stable components remain
backward compatible while giving you the option to remove the
unmaintained/unstable ones.





>    As another dimension to this discussion, I think there is a limit to the
>>>>  number of dependencies Flume can reasonably pull in and keep straight
>>>>  without shading or classloading tricks, which themselves add another
>>>>  layer
>>>>  of pain/difficulty to debugging.
>>>> 
>>> 
>>  This does not completely solve that probleme but is somewhat related: what
>>  about moving all the current sources and sinks as plugins?
>>  So the core remains lean with all its dependencies in lib/ and all the
>>  sources and sinks specific libs end up in plugins.d/<plugin>/libext.
>> 
>>  This would be more in the context of Apache Bigtop and packages, but that
>>  would enable people to pick and choose their dependencies. For instance
>>  doing a "yum install flume-ng-hdfs flume-ng-redis flume-ng-agent".
>>  Right now I don't really care about the hdfs sink, but I end up having to
>>  download a bunch of hdfs related packages that are not really needed.
> 
> 
>  Well... that actually doesn't solve the dependency problem at all. It
>  pushes the requirement of knowledge of what works with what to the
>  end-user. And this type of thing (JAR incompatibility) is nearly impossible
>  to detect automatically, so we are back to end-users sifting through poms,
>  Java API docs, and release notes - which is what they would have to do with
>  a Github project anyway. But now it's for *everything* related to Flume. So
>  we just made the Flume plugin compatibility situation much worse that it
>  already was.
> 


Yeah this is just on a related note from a user experience.
This would keep Apache Flume installation lean and to the point. It's
more about tailoring the installation to the need than to solve
dependencies issues.


>  Right now, every plugin that ships with Flume can be run in the same JVM
>  process as every other plugin, with the exception (much to my regret) of
>  Solr and ElasticSearch. I am loathe to add anything else to that "landmine
>  list". In my view, we need to come up with a technical solution to that
>  problem before we decide to open the floodgates to any and all plugins /
>  dependencies, regardless of the plugin acceptance / maintainability
>  discussion (the two are orthogonal concerns). Which is why I brought up the
>  possibility of classloading, or OSGI, or something that attempts to solve
>  this problem. It's not rocket science (all servlet containers do this), but
>  it's added implementation / debugging complexity for sure and someone has
>  to do the work to implement it (if we agree that is the right solution to
>  the problem here).
> 

All GNU/Linux distributions and even Apache Bigtop face that very same
issue. And the right answer would be to fix the issue upstream or to use
some of the tricks you cite above. And it's also pretty abstract without
concrete cases. So can you point to tickets or describe more the issue
between Apache Solr and Elasticsearch? Maybe we can use that to derive a
solution.

Dependencies conflicts should still be pretty rare though. So I would
not throw out the baby with the bath water. Conflicting cases should
remain the minority and I don't really see the reason to pass on 95% of
contributions and all their attached benefits because there may be some
integration issues with the remaining 5%.



>  TL;DR: I don't think the conflicting-dependencies issue has a "project
>  policy" or packaging solution.
> 
>  Mike
>

Re: [DISCUSS] Feature bloat and contrib module

Reply via email to