Folks, Can we close this thread, one way or the other. It's been a while.
thanks ashish On Mon, Dec 23, 2013 at 10:34 PM, Gabriel Commeau <[email protected] > wrote: > Hi, > > IMHO, contrib modules seem better for the following reasons: > 1. Keep the core as thin as possible. I like the idea of a pluggeable Flume > where the user adds the components needed, and only these. I imagine that > realistically, most users only use a handful of components, and therefore > don't need the whole library of every existing (or supported) sink and > source on their localhost. We could make the process of adding/removing > components easier, so that it becomes trivial for the user to > download/install/activate them. > 2. License considerations. I can envision cases where one would want to > integrate Flume with another system that uses a license that's not > compatible with Apache's. So whether a contributor needs or wants to use a > different license, this contribution cannot currently be added to Flume. > I'm > not an expert on licenses, but I wonder if it would be possible to include > these contributions using a contrib module. > 3. Easiest way in. "Getting in" becomes trivial and open to all. Seems to > me > like the best way to grow the project. > 4. Community-based out. With a contrib project, we actually don't really > need to move contributions out. The community, if able to vote or report > usage naturally manages which contributions are used and which aren't. > 5. Competition and maintenance. As software engineers, there are always > tradeoffs we need to make. Imagine a component that could have its > performances increased at the cost of, for example, compatibility with some > other systems. Why would this optimization have to conflict with Apache's > main component? Couldn't both live side-by-side, and let the user choose > the > one that better fit his/her specific context and requirements? > So to answer the original discussion questions: I'd argue that contrib > modules would benefit Flume, that they should be released on their own > schedule, supported independently, and be compatible with whatever version > of Flume the authors wish. > I like cathedrals, and I tend to design my applications like that. But in > this case, I believe a little bit of bazaar would be best. > I hope this helps. > > Gabriel > > From: Bruno Mahé <[email protected]> > Reply-To: <[email protected]> > Date: Saturday, December 21, 2013 4:29 PM > To: <[email protected]> > Subject: Re: [DISCUSS] Feature bloat and contrib module > > See inline. > > On 12/20/2013 04:01 PM, Mike Percy wrote: > > On Mon, Dec 16, 2013 at 11:34 PM, Bruno Mahé <[email protected]> wrote: > >> > >> Summarizing my suggestions: > >> * Commiters are not the sole developers. There is no reason for > commiters > >> to take all these responsibilities on their shoulders. Also developer > != > >> commiter. > >> * Easy IN, Easy OUT. If no one volunteers to maintain something, then > >> there is no reason to keep it since the community is not interested in > it > >> anyway. > >> * Easy to get in means more contributions and more contributors. Also a > >> way to grow community and have contributors becoming full commiters. > It is > >> more than likely they will notice things that can be improved > elsewhere and > >> start being more active overall. > >> * Easy to get out means only the maintained stuff stays. Stuff would > most > >> likely get kicked out before a feature release (ex: 1.5 vs 1.6). Bug > fix > >> releases have no reason to kick out components since they are unlikely > to > >> break in between bug fix releases (ex: 1.5.2 vs 1.5.3). > >> * Spreading sources and sinks is going to be quite hard on users. This > >> would means users would have to be developers themselves since they > would > >> have to: > >> - Find the source/sink on some random repository which may or may > not > >> be maintained. Pick one of the repository out of all the ones the user > has > >> found > >> - Build it against their own version of Apache Flume (Apache, CDH, > >> PHD, HDP...) > >> - Resolve dependencies and build issues between their version of > >> Apache Flume and source/sink since the source/sink may or may not have > been > >> maintained > >> - Qualify the integration between their version of Apache Flume > and > >> source/sink > >> * Spreading sources and sinks is going to be quite hard on developers. > Why > >> should I target Apache Flume when I can just target my version of Flume > >> (CDH, PHD, HDP) ? > >> * Spreading sources and sinks is going to be quite hard on integrators > >> such as Apache Bigtop. This would means working with as many people as > >> there are source/sinks. Each own with their own way of working and > >> schedules. > >> > > > > Hey Bruno, great to hear from you on this list! > > > > Thanks! > > > Good points, and in principle, I mostly agree with what you are saying, > but > > I have concerns about some of the proposed approaches. Specifically: > > > > So why not just removing features or parts that are not maintained? > >> Being more aggressive in removing unmaintained parts would enable > Apache > >> Flume to be more inclusive with regards to contributions. > >> > > > > Removing stuff breaks back-compat and it is hard to know who is using a > > component. If just one person is using something, is it worth it to keep > > something? Where do we draw the line? That said, I am not against > removing > > stuff that made it into a release (after marking it @Deprecated for a > > release) if we have consensus among committers that it needs to go. > > > > First of all, it is very hard to quantify users. And also they tend to > be silent if everything goes well. > Also I don't see the issue if just one person is using something. If > such component is maintained and does not create burden, why removing > it? Keeping it would be easy and make everyone happy. > > The way I would see components being removed would rather be based on > their maintenance cost. For instance, is it blocking a release? Or have > all its tests failed for the past few weeks and no one care? > > Regarding compatibility between versions, I would say: > * You don't have to remove the component as soon as you create the > ticket to remove it. This way you give enough opportunity for some > people to step up and take over before kicking it out. > > * Manage expectations by adding labels to the components. When a > component is introduced, label it as "experimental", then a few release > later "beta" and then a few releases later "stable". Note that some > other criteria can be added to the labeling but this gives the > opportunity to announce that experimental components may not survive the > next release. This way you can ensure that stable components remain > backward compatible while giving you the option to remove the > unmaintained/unstable ones. > > > > > > > As another dimension to this discussion, I think there is a limit to > the > >>>> number of dependencies Flume can reasonably pull in and keep straight > >>>> without shading or classloading tricks, which themselves add another > >>>> layer > >>>> of pain/difficulty to debugging. > >>>> > >>> > >> This does not completely solve that probleme but is somewhat related: > what > >> about moving all the current sources and sinks as plugins? > >> So the core remains lean with all its dependencies in lib/ and all the > >> sources and sinks specific libs end up in plugins.d/<plugin>/libext. > >> > >> This would be more in the context of Apache Bigtop and packages, but > that > >> would enable people to pick and choose their dependencies. For instance > >> doing a "yum install flume-ng-hdfs flume-ng-redis flume-ng-agent". > >> Right now I don't really care about the hdfs sink, but I end up having > to > >> download a bunch of hdfs related packages that are not really needed. > > > > > > Well... that actually doesn't solve the dependency problem at all. It > > pushes the requirement of knowledge of what works with what to the > > end-user. And this type of thing (JAR incompatibility) is nearly > impossible > > to detect automatically, so we are back to end-users sifting through > poms, > > Java API docs, and release notes - which is what they would have to do > with > > a Github project anyway. But now it's for *everything* related to > Flume. So > > we just made the Flume plugin compatibility situation much worse that it > > already was. > > > > > Yeah this is just on a related note from a user experience. > This would keep Apache Flume installation lean and to the point. It's > more about tailoring the installation to the need than to solve > dependencies issues. > > > > Right now, every plugin that ships with Flume can be run in the same JVM > > process as every other plugin, with the exception (much to my regret) of > > Solr and ElasticSearch. I am loathe to add anything else to that > "landmine > > list". In my view, we need to come up with a technical solution to that > > problem before we decide to open the floodgates to any and all plugins / > > dependencies, regardless of the plugin acceptance / maintainability > > discussion (the two are orthogonal concerns). Which is why I brought up > the > > possibility of classloading, or OSGI, or something that attempts to > solve > > this problem. It's not rocket science (all servlet containers do this), > but > > it's added implementation / debugging complexity for sure and someone > has > > to do the work to implement it (if we agree that is the right solution > to > > the problem here). > > > > All GNU/Linux distributions and even Apache Bigtop face that very same > issue. And the right answer would be to fix the issue upstream or to use > some of the tricks you cite above. And it's also pretty abstract without > concrete cases. So can you point to tickets or describe more the issue > between Apache Solr and Elasticsearch? Maybe we can use that to derive a > solution. > > Dependencies conflicts should still be pretty rare though. So I would > not throw out the baby with the bath water. Conflicting cases should > remain the minority and I don't really see the reason to pass on 95% of > contributions and all their attached benefits because there may be some > integration issues with the remaining 5%. > > > > > TL;DR: I don't think the conflicting-dependencies issue has a "project > > policy" or packaging solution. > > > > Mike > > > > > > > -- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal
