Re: [DISCUSS] Feature bloat and contrib module

Ashish Thu, 16 Jan 2014 00:45:34 -0800

Folks,

Can we close this thread, one way or the other. It's been a while.


thanks
ashish


On Mon, Dec 23, 2013 at 10:34 PM, Gabriel Commeau <[email protected]
> wrote:

> Hi,
>
> IMHO, contrib modules seem better for the following reasons:
> 1. Keep the core as thin as possible. I like the idea of a pluggeable Flume
> where the user adds the components needed, and only these. I imagine that
> realistically, most users only use a handful of components, and therefore
> don't need the whole library of every existing (or supported) sink and
> source on their localhost. We could make the process of adding/removing
> components easier, so that it becomes trivial for the user to
> download/install/activate them.
> 2. License considerations. I can envision cases where one would want to
> integrate Flume with another system that uses a license that's not
> compatible with Apache's. So whether a contributor needs or wants to use a
> different license, this contribution cannot currently be added to Flume.
> I'm
> not an expert on licenses, but I wonder if it would be possible to include
> these contributions using a contrib module.
> 3. Easiest way in. "Getting in" becomes trivial and open to all. Seems to
> me
> like the best way to grow the project.
> 4. Community-based out. With a contrib project, we actually don't really
> need to move contributions out. The community, if able to vote or report
> usage naturally manages which contributions are used and which aren't.
> 5. Competition and maintenance. As software engineers, there are always
> tradeoffs we need to make. Imagine a component that could have its
> performances increased at the cost of, for example, compatibility with some
> other systems. Why would this optimization have to conflict with Apache's
> main component? Couldn't both live side-by-side, and let the user choose
> the
> one that better fit his/her specific context and requirements?
> So to answer the original discussion questions: I'd argue that contrib
> modules would benefit Flume, that they should be released on their own
> schedule, supported independently, and be compatible with whatever version
> of Flume the authors wish.
> I like cathedrals, and I tend to design my applications like that. But in
> this case, I believe a little bit of bazaar would be best.
> I hope this helps.
>
> Gabriel
>
> From:  Bruno Mahé <[email protected]>
> Reply-To:  <[email protected]>
> Date:  Saturday, December 21, 2013 4:29 PM
> To:  <[email protected]>
> Subject:  Re: [DISCUSS] Feature bloat and contrib module
>
> See inline.
>
> On 12/20/2013 04:01 PM, Mike Percy wrote:
> >  On Mon, Dec 16, 2013 at 11:34 PM, Bruno Mahé <[email protected]> wrote:
> >>
> >>  Summarizing my suggestions:
> >>  * Commiters are not the sole developers. There is no reason for
> commiters
> >>  to take all these responsibilities on their shoulders. Also developer
> !=
> >>  commiter.
> >>  * Easy IN, Easy OUT. If no one volunteers to maintain something, then
> >>  there is no reason to keep it since the community is not interested in
> it
> >>  anyway.
> >>  * Easy to get in means more contributions and more contributors. Also a
> >>  way to grow community and have contributors becoming full commiters.
> It is
> >>  more than likely they will notice things that can be improved
> elsewhere and
> >>  start being more active overall.
> >>  * Easy to get out means only the maintained stuff stays. Stuff would
> most
> >>  likely get kicked out before a feature release (ex: 1.5 vs 1.6). Bug
> fix
> >>  releases have no reason to kick out components since they are unlikely
> to
> >>  break in between bug fix releases (ex: 1.5.2 vs 1.5.3).
> >>  * Spreading sources and sinks is going to be quite hard on users. This
> >>  would means users would have to be developers themselves since they
> would
> >>  have to:
> >>       - Find the source/sink on some random repository which may or may
> not
> >>  be maintained. Pick one of the repository out of all the ones the user
> has
> >>  found
> >>       - Build it against their own version of Apache Flume (Apache, CDH,
> >>  PHD, HDP...)
> >>       - Resolve dependencies and build issues between their version of
> >>  Apache Flume and source/sink since the source/sink may or may not have
> been
> >>  maintained
> >>       - Qualify the integration between their version of Apache Flume
> and
> >>  source/sink
> >>  * Spreading sources and sinks is going to be quite hard on developers.
> Why
> >>  should I target Apache Flume when I can just target my version of Flume
> >>  (CDH, PHD, HDP) ?
> >>  * Spreading sources and sinks is going to be quite hard on integrators
> >>  such as Apache Bigtop. This would means working with as many people as
> >>  there are source/sinks. Each own with their own way of working and
> >>  schedules.
> >>
> >
> >  Hey Bruno, great to hear from you on this list!
> >
>
> Thanks!
>
> >  Good points, and in principle, I mostly agree with what you are saying,
> but
> >  I have concerns about some of the proposed approaches. Specifically:
> >
> >  So why not just removing features or parts that are not maintained?
> >>  Being more aggressive in removing unmaintained parts would enable
> Apache
> >>  Flume to be more inclusive with regards to contributions.
> >>
> >
> >  Removing stuff breaks back-compat and it is hard to know who is using a
> >  component. If just one person is using something, is it worth it to keep
> >  something? Where do we draw the line? That said, I am not against
> removing
> >  stuff that made it into a release (after marking it @Deprecated for a
> >  release) if we have consensus among committers that it needs to go.
> >
>
> First of all, it is very hard to quantify users. And also they tend to
> be silent if everything goes well.
> Also I don't see the issue if just one person is using something. If
> such component is maintained and does not create burden, why removing
> it? Keeping it would be easy and make everyone happy.
>
> The way I would see components being removed would rather be based on
> their maintenance cost. For instance, is it blocking a release? Or have
> all its tests failed for the past few weeks and no one care?
>
> Regarding compatibility between versions, I would say:
> * You don't have to remove the component as soon as you create the
> ticket to remove it. This way you give enough opportunity for some
> people to step up and take over before kicking it out.
>
> * Manage expectations by adding labels to the components. When a
> component is introduced, label it as "experimental", then a few release
> later "beta" and then a few releases later "stable". Note that some
> other criteria can be added to the labeling but this gives the
> opportunity to announce that experimental components may not survive the
> next release. This way you can ensure that stable components remain
> backward compatible while giving you the option to remove the
> unmaintained/unstable ones.
>
>
>
>
>
> >    As another dimension to this discussion, I think there is a limit to
> the
> >>>>  number of dependencies Flume can reasonably pull in and keep straight
> >>>>  without shading or classloading tricks, which themselves add another
> >>>>  layer
> >>>>  of pain/difficulty to debugging.
> >>>>
> >>>
> >>  This does not completely solve that probleme but is somewhat related:
> what
> >>  about moving all the current sources and sinks as plugins?
> >>  So the core remains lean with all its dependencies in lib/ and all the
> >>  sources and sinks specific libs end up in plugins.d/<plugin>/libext.
> >>
> >>  This would be more in the context of Apache Bigtop and packages, but
> that
> >>  would enable people to pick and choose their dependencies. For instance
> >>  doing a "yum install flume-ng-hdfs flume-ng-redis flume-ng-agent".
> >>  Right now I don't really care about the hdfs sink, but I end up having
> to
> >>  download a bunch of hdfs related packages that are not really needed.
> >
> >
> >  Well... that actually doesn't solve the dependency problem at all. It
> >  pushes the requirement of knowledge of what works with what to the
> >  end-user. And this type of thing (JAR incompatibility) is nearly
> impossible
> >  to detect automatically, so we are back to end-users sifting through
> poms,
> >  Java API docs, and release notes - which is what they would have to do
> with
> >  a Github project anyway. But now it's for *everything* related to
> Flume. So
> >  we just made the Flume plugin compatibility situation much worse that it
> >  already was.
> >
>
>
> Yeah this is just on a related note from a user experience.
> This would keep Apache Flume installation lean and to the point. It's
> more about tailoring the installation to the need than to solve
> dependencies issues.
>
>
> >  Right now, every plugin that ships with Flume can be run in the same JVM
> >  process as every other plugin, with the exception (much to my regret) of
> >  Solr and ElasticSearch. I am loathe to add anything else to that
> "landmine
> >  list". In my view, we need to come up with a technical solution to that
> >  problem before we decide to open the floodgates to any and all plugins /
> >  dependencies, regardless of the plugin acceptance / maintainability
> >  discussion (the two are orthogonal concerns). Which is why I brought up
> the
> >  possibility of classloading, or OSGI, or something that attempts to
> solve
> >  this problem. It's not rocket science (all servlet containers do this),
> but
> >  it's added implementation / debugging complexity for sure and someone
> has
> >  to do the work to implement it (if we agree that is the right solution
> to
> >  the problem here).
> >
>
> All GNU/Linux distributions and even Apache Bigtop face that very same
> issue. And the right answer would be to fix the issue upstream or to use
> some of the tricks you cite above. And it's also pretty abstract without
> concrete cases. So can you point to tickets or describe more the issue
> between Apache Solr and Elasticsearch? Maybe we can use that to derive a
> solution.
>
> Dependencies conflicts should still be pretty rare though. So I would
> not throw out the baby with the bath water. Conflicting cases should
> remain the minority and I don't really see the reason to pass on 95% of
> contributions and all their attached benefits because there may be some
> integration issues with the remaining 5%.
>
>
>
> >  TL;DR: I don't think the conflicting-dependencies issue has a "project
> >  policy" or packaging solution.
> >
> >  Mike
> >
>
>
>
>
>


-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: [DISCUSS] Feature bloat and contrib module

Reply via email to