One of the things that OS distros tend to do is build all the common deps for things that tend to conflict, and in our case I don't think even BigTop is planning on building Guava, Protobuf, or other commonly-conflicting libraries. So I am still of the opinion that classloader isolation is needed if we plan to continue expanding scope and including more off-the-shelf integrations in the core Flume project.
Another option is to use maven-shade on individual components, that might work for many cases but some projects themselves (notably Flume) hard-code class names and don't work with shading. Granted, usually that is fixable provided the cycles are put in to work around the issues that come up when shading the upstream components. Mike On Thu, Jan 16, 2014 at 12:44 AM, Ashish <[email protected]> wrote: > Folks, > > Can we close this thread, one way or the other. It's been a while. > > thanks > ashish > > > On Mon, Dec 23, 2013 at 10:34 PM, Gabriel Commeau < > [email protected] > > wrote: > > > Hi, > > > > IMHO, contrib modules seem better for the following reasons: > > 1. Keep the core as thin as possible. I like the idea of a pluggeable > Flume > > where the user adds the components needed, and only these. I imagine that > > realistically, most users only use a handful of components, and therefore > > don't need the whole library of every existing (or supported) sink and > > source on their localhost. We could make the process of adding/removing > > components easier, so that it becomes trivial for the user to > > download/install/activate them. > > 2. License considerations. I can envision cases where one would want to > > integrate Flume with another system that uses a license that's not > > compatible with Apache's. So whether a contributor needs or wants to use > a > > different license, this contribution cannot currently be added to Flume. > > I'm > > not an expert on licenses, but I wonder if it would be possible to > include > > these contributions using a contrib module. > > 3. Easiest way in. "Getting in" becomes trivial and open to all. Seems to > > me > > like the best way to grow the project. > > 4. Community-based out. With a contrib project, we actually don't really > > need to move contributions out. The community, if able to vote or report > > usage naturally manages which contributions are used and which aren't. > > 5. Competition and maintenance. As software engineers, there are always > > tradeoffs we need to make. Imagine a component that could have its > > performances increased at the cost of, for example, compatibility with > some > > other systems. Why would this optimization have to conflict with Apache's > > main component? Couldn't both live side-by-side, and let the user choose > > the > > one that better fit his/her specific context and requirements? > > So to answer the original discussion questions: I'd argue that contrib > > modules would benefit Flume, that they should be released on their own > > schedule, supported independently, and be compatible with whatever > version > > of Flume the authors wish. > > I like cathedrals, and I tend to design my applications like that. But in > > this case, I believe a little bit of bazaar would be best. > > I hope this helps. > > > > Gabriel > > > > From: Bruno Mahé <[email protected]> > > Reply-To: <[email protected]> > > Date: Saturday, December 21, 2013 4:29 PM > > To: <[email protected]> > > Subject: Re: [DISCUSS] Feature bloat and contrib module > > > > See inline. > > > > On 12/20/2013 04:01 PM, Mike Percy wrote: > > > On Mon, Dec 16, 2013 at 11:34 PM, Bruno Mahé <[email protected]> > wrote: > > >> > > >> Summarizing my suggestions: > > >> * Commiters are not the sole developers. There is no reason for > > commiters > > >> to take all these responsibilities on their shoulders. Also developer > > != > > >> commiter. > > >> * Easy IN, Easy OUT. If no one volunteers to maintain something, then > > >> there is no reason to keep it since the community is not interested > in > > it > > >> anyway. > > >> * Easy to get in means more contributions and more contributors. > Also a > > >> way to grow community and have contributors becoming full commiters. > > It is > > >> more than likely they will notice things that can be improved > > elsewhere and > > >> start being more active overall. > > >> * Easy to get out means only the maintained stuff stays. Stuff would > > most > > >> likely get kicked out before a feature release (ex: 1.5 vs 1.6). Bug > > fix > > >> releases have no reason to kick out components since they are > unlikely > > to > > >> break in between bug fix releases (ex: 1.5.2 vs 1.5.3). > > >> * Spreading sources and sinks is going to be quite hard on users. > This > > >> would means users would have to be developers themselves since they > > would > > >> have to: > > >> - Find the source/sink on some random repository which may or > may > > not > > >> be maintained. Pick one of the repository out of all the ones the > user > > has > > >> found > > >> - Build it against their own version of Apache Flume (Apache, > CDH, > > >> PHD, HDP...) > > >> - Resolve dependencies and build issues between their version of > > >> Apache Flume and source/sink since the source/sink may or may not > have > > been > > >> maintained > > >> - Qualify the integration between their version of Apache Flume > > and > > >> source/sink > > >> * Spreading sources and sinks is going to be quite hard on > developers. > > Why > > >> should I target Apache Flume when I can just target my version of > Flume > > >> (CDH, PHD, HDP) ? > > >> * Spreading sources and sinks is going to be quite hard on > integrators > > >> such as Apache Bigtop. This would means working with as many people > as > > >> there are source/sinks. Each own with their own way of working and > > >> schedules. > > >> > > > > > > Hey Bruno, great to hear from you on this list! > > > > > > > Thanks! > > > > > Good points, and in principle, I mostly agree with what you are > saying, > > but > > > I have concerns about some of the proposed approaches. Specifically: > > > > > > So why not just removing features or parts that are not maintained? > > >> Being more aggressive in removing unmaintained parts would enable > > Apache > > >> Flume to be more inclusive with regards to contributions. > > >> > > > > > > Removing stuff breaks back-compat and it is hard to know who is using > a > > > component. If just one person is using something, is it worth it to > keep > > > something? Where do we draw the line? That said, I am not against > > removing > > > stuff that made it into a release (after marking it @Deprecated for a > > > release) if we have consensus among committers that it needs to go. > > > > > > > First of all, it is very hard to quantify users. And also they tend to > > be silent if everything goes well. > > Also I don't see the issue if just one person is using something. If > > such component is maintained and does not create burden, why removing > > it? Keeping it would be easy and make everyone happy. > > > > The way I would see components being removed would rather be based on > > their maintenance cost. For instance, is it blocking a release? Or have > > all its tests failed for the past few weeks and no one care? > > > > Regarding compatibility between versions, I would say: > > * You don't have to remove the component as soon as you create the > > ticket to remove it. This way you give enough opportunity for some > > people to step up and take over before kicking it out. > > > > * Manage expectations by adding labels to the components. When a > > component is introduced, label it as "experimental", then a few release > > later "beta" and then a few releases later "stable". Note that some > > other criteria can be added to the labeling but this gives the > > opportunity to announce that experimental components may not survive the > > next release. This way you can ensure that stable components remain > > backward compatible while giving you the option to remove the > > unmaintained/unstable ones. > > > > > > > > > > > > > As another dimension to this discussion, I think there is a limit to > > the > > >>>> number of dependencies Flume can reasonably pull in and keep > straight > > >>>> without shading or classloading tricks, which themselves add > another > > >>>> layer > > >>>> of pain/difficulty to debugging. > > >>>> > > >>> > > >> This does not completely solve that probleme but is somewhat related: > > what > > >> about moving all the current sources and sinks as plugins? > > >> So the core remains lean with all its dependencies in lib/ and all > the > > >> sources and sinks specific libs end up in plugins.d/<plugin>/libext. > > >> > > >> This would be more in the context of Apache Bigtop and packages, but > > that > > >> would enable people to pick and choose their dependencies. For > instance > > >> doing a "yum install flume-ng-hdfs flume-ng-redis flume-ng-agent". > > >> Right now I don't really care about the hdfs sink, but I end up > having > > to > > >> download a bunch of hdfs related packages that are not really needed. > > > > > > > > > Well... that actually doesn't solve the dependency problem at all. It > > > pushes the requirement of knowledge of what works with what to the > > > end-user. And this type of thing (JAR incompatibility) is nearly > > impossible > > > to detect automatically, so we are back to end-users sifting through > > poms, > > > Java API docs, and release notes - which is what they would have to do > > with > > > a Github project anyway. But now it's for *everything* related to > > Flume. So > > > we just made the Flume plugin compatibility situation much worse that > it > > > already was. > > > > > > > > > Yeah this is just on a related note from a user experience. > > This would keep Apache Flume installation lean and to the point. It's > > more about tailoring the installation to the need than to solve > > dependencies issues. > > > > > > > Right now, every plugin that ships with Flume can be run in the same > JVM > > > process as every other plugin, with the exception (much to my regret) > of > > > Solr and ElasticSearch. I am loathe to add anything else to that > > "landmine > > > list". In my view, we need to come up with a technical solution to > that > > > problem before we decide to open the floodgates to any and all > plugins / > > > dependencies, regardless of the plugin acceptance / maintainability > > > discussion (the two are orthogonal concerns). Which is why I brought > up > > the > > > possibility of classloading, or OSGI, or something that attempts to > > solve > > > this problem. It's not rocket science (all servlet containers do > this), > > but > > > it's added implementation / debugging complexity for sure and someone > > has > > > to do the work to implement it (if we agree that is the right solution > > to > > > the problem here). > > > > > > > All GNU/Linux distributions and even Apache Bigtop face that very same > > issue. And the right answer would be to fix the issue upstream or to use > > some of the tricks you cite above. And it's also pretty abstract without > > concrete cases. So can you point to tickets or describe more the issue > > between Apache Solr and Elasticsearch? Maybe we can use that to derive a > > solution. > > > > Dependencies conflicts should still be pretty rare though. So I would > > not throw out the baby with the bath water. Conflicting cases should > > remain the minority and I don't really see the reason to pass on 95% of > > contributions and all their attached benefits because there may be some > > integration issues with the remaining 5%. > > > > > > > > > TL;DR: I don't think the conflicting-dependencies issue has a "project > > > policy" or packaging solution. > > > > > > Mike > > > > > > > > > > > > > > > > -- > thanks > ashish > > Blog: http://www.ashishpaliwal.com/blog > My Photo Galleries: http://www.pbase.com/ashishpaliwal >
