See inline.

On 12/20/2013 04:01 PM, Mike Percy wrote:
On Mon, Dec 16, 2013 at 11:34 PM, Bruno Mahé <[email protected]> wrote:

Summarizing my suggestions:
* Commiters are not the sole developers. There is no reason for commiters
to take all these responsibilities on their shoulders. Also developer !=
commiter.
* Easy IN, Easy OUT. If no one volunteers to maintain something, then
there is no reason to keep it since the community is not interested in it
anyway.
* Easy to get in means more contributions and more contributors. Also a
way to grow community and have contributors becoming full commiters. It is
more than likely they will notice things that can be improved elsewhere and
start being more active overall.
* Easy to get out means only the maintained stuff stays. Stuff would most
likely get kicked out before a feature release (ex: 1.5 vs 1.6). Bug fix
releases have no reason to kick out components since they are unlikely to
break in between bug fix releases (ex: 1.5.2 vs 1.5.3).
* Spreading sources and sinks is going to be quite hard on users. This
would means users would have to be developers themselves since they would
have to:
     - Find the source/sink on some random repository which may or may not
be maintained. Pick one of the repository out of all the ones the user has
found
     - Build it against their own version of Apache Flume (Apache, CDH,
PHD, HDP...)
     - Resolve dependencies and build issues between their version of
Apache Flume and source/sink since the source/sink may or may not have been
maintained
     - Qualify the integration between their version of Apache Flume and
source/sink
* Spreading sources and sinks is going to be quite hard on developers. Why
should I target Apache Flume when I can just target my version of Flume
(CDH, PHD, HDP) ?
* Spreading sources and sinks is going to be quite hard on integrators
such as Apache Bigtop. This would means working with as many people as
there are source/sinks. Each own with their own way of working and
schedules.


Hey Bruno, great to hear from you on this list!


Thanks!

Good points, and in principle, I mostly agree with what you are saying, but
I have concerns about some of the proposed approaches. Specifically:

So why not just removing features or parts that are not maintained?
Being more aggressive in removing unmaintained parts would enable Apache
Flume to be more inclusive with regards to contributions.


Removing stuff breaks back-compat and it is hard to know who is using a
component. If just one person is using something, is it worth it to keep
something? Where do we draw the line? That said, I am not against removing
stuff that made it into a release (after marking it @Deprecated for a
release) if we have consensus among committers that it needs to go.


First of all, it is very hard to quantify users. And also they tend to be silent if everything goes well. Also I don't see the issue if just one person is using something. If such component is maintained and does not create burden, why removing it? Keeping it would be easy and make everyone happy.

The way I would see components being removed would rather be based on their maintenance cost. For instance, is it blocking a release? Or have all its tests failed for the past few weeks and no one care?

Regarding compatibility between versions, I would say:
* You don't have to remove the component as soon as you create the ticket to remove it. This way you give enough opportunity for some people to step up and take over before kicking it out.

* Manage expectations by adding labels to the components. When a component is introduced, label it as "experimental", then a few release later "beta" and then a few releases later "stable". Note that some other criteria can be added to the labeling but this gives the opportunity to announce that experimental components may not survive the next release. This way you can ensure that stable components remain backward compatible while giving you the option to remove the unmaintained/unstable ones.





  As another dimension to this discussion, I think there is a limit to the
number of dependencies Flume can reasonably pull in and keep straight
without shading or classloading tricks, which themselves add another
layer
of pain/difficulty to debugging.


This does not completely solve that probleme but is somewhat related: what
about moving all the current sources and sinks as plugins?
So the core remains lean with all its dependencies in lib/ and all the
sources and sinks specific libs end up in plugins.d/<plugin>/libext.

This would be more in the context of Apache Bigtop and packages, but that
would enable people to pick and choose their dependencies. For instance
doing a "yum install flume-ng-hdfs flume-ng-redis flume-ng-agent".
Right now I don't really care about the hdfs sink, but I end up having to
download a bunch of hdfs related packages that are not really needed.


Well... that actually doesn't solve the dependency problem at all. It
pushes the requirement of knowledge of what works with what to the
end-user. And this type of thing (JAR incompatibility) is nearly impossible
to detect automatically, so we are back to end-users sifting through poms,
Java API docs, and release notes - which is what they would have to do with
a Github project anyway. But now it's for *everything* related to Flume. So
we just made the Flume plugin compatibility situation much worse that it
already was.



Yeah this is just on a related note from a user experience.
This would keep Apache Flume installation lean and to the point. It's more about tailoring the installation to the need than to solve dependencies issues.


Right now, every plugin that ships with Flume can be run in the same JVM
process as every other plugin, with the exception (much to my regret) of
Solr and ElasticSearch. I am loathe to add anything else to that "landmine
list". In my view, we need to come up with a technical solution to that
problem before we decide to open the floodgates to any and all plugins /
dependencies, regardless of the plugin acceptance / maintainability
discussion (the two are orthogonal concerns). Which is why I brought up the
possibility of classloading, or OSGI, or something that attempts to solve
this problem. It's not rocket science (all servlet containers do this), but
it's added implementation / debugging complexity for sure and someone has
to do the work to implement it (if we agree that is the right solution to
the problem here).


All GNU/Linux distributions and even Apache Bigtop face that very same issue. And the right answer would be to fix the issue upstream or to use some of the tricks you cite above. And it's also pretty abstract without concrete cases. So can you point to tickets or describe more the issue between Apache Solr and Elasticsearch? Maybe we can use that to derive a solution.

Dependencies conflicts should still be pretty rare though. So I would not throw out the baby with the bath water. Conflicting cases should remain the minority and I don't really see the reason to pass on 95% of contributions and all their attached benefits because there may be some integration issues with the remaining 5%.



TL;DR: I don't think the conflicting-dependencies issue has a "project
policy" or packaging solution.

Mike


Reply via email to