To build on Bobby's statement, it does pain me as a user to have to search outside of the project modules to find a compatible build that works with the latest version of storm as well as the latest module version. However, in instances such as hbase, cassandra, kafka, etc., I think these commonly used contrib projects should be pulled into storm if they meet stringent criteria of:
1) Several volunteer developers familiar with code to update as new versions arise 2) Fully implemented bolt/spout " If the build and test time starts to take too long, to me that means we need to start wondering if we have too many contrib modules." -- +1 I would be willing to volunteer with the cassandra backing map module (especially with the latest CQL3 release). On Wed, Feb 26, 2014 at 12:35 PM, Bobby Evans <[email protected]> wrote: > I can see a lot of value in having a distribution of storm that comes with > batteries included, everything is tested together and you know it works. > But I don't see much long term developer benefit in building them all > together. If there is strong coupling between storm and these external > projects so that they break when storm changes then we need to understand > the coupling and decide if we want to reduce that coupling by stabilizing > APIs, improving version numbering and release process, etc.; or if the > functionality is something that should be offered as a base service in > storm. > > I can see politically the value of giving these other projects a home in > Apache, and making them sub-projects is the simplest route to that. I'd > love to have storm on yarn inside Apache. I just don't want to go > overboard with it. There was a time when HBase was a "contrib" module > under Hadoop along with a lot of other things, and the Apache board came > and told Hadoop to brake it up. > > Bringing storm-kafka into storm does not sound like it will solve much > from a developer's perspective, because there is at least as much coupling > with kafka as there is with storm. I can see how it is a huge amount of > overhead and pain to set up a new project just for a few hundred lines of > code, as such I am in favor of pulling in closely related projects, > especially those that are spouts and state implementations. I just want to > be sure that we do it carefully, with a good reason, and with enough people > who are familiar with the code to support it long term. > > If it starts to look like we are pulling in too many projects perhaps we > should look at something more like the bigtop project > https://bigtop.apache.org/ which produces a tested distribution of Hadoop > with many different sub-projects included in it. > > I am also a bit concerned about these sub-projects becoming second class > citizens, where we break something, but because the build is off by default > we don't know it. I would prefer that they are built and tested by > default. If the build and test time starts to take too long, to me that > means we need to start wondering if we have too many contrib modules. > > --Bobby > > From: Brian Enochson <[email protected]<mailto: > [email protected]>> > Reply-To: "[email protected]<mailto: > [email protected]>" <[email protected]<mailto: > [email protected]>> > Date: Tuesday, February 25, 2014 at 9:50 PM > To: "[email protected]<mailto: > [email protected]>" <[email protected]<mailto: > [email protected]>> > Cc: "[email protected]<mailto:[email protected]>" > <[email protected]<mailto:[email protected]>> > Subject: Re: [DISCUSS] Pulling "Contrib" Modules into Apache > > hi, > I am in agreement with Taylor and believe I understand his intent. An > incredible tool/framework/application like Storm is only enhanced and gains > value from the number of well maintained and vetted modules that can be > used for integration and adding further functionality. > I am relatively new to the Storm community but have spent quite some > time reviewing contributing modules out there, reviewing various duplicates > and running into some version incompatibilities. I understand the need to > keep Storm itself pure, but do think there needs to be some structure and > governance added to the contributing modules. Look at the benefit a tool > like npm brings to the node community. > I like the idea of sponsorship, vetting and a community vote. I, as > sure many would be, am willing to offer support and time to working through > how to set this up and helping with the implementation if it is decided to > pursue some solution. > I hope these views are taken in the sprit they are made, to make this > incredible system even better along with the surrounding eco-system. > > Thanks, > Brian > > > On Tue, Feb 25, 2014 at 9:36 PM, P. Taylor Goetz <[email protected] > <mailto:[email protected]>> wrote: > Just to be clear (and play a little Devil's advocate :) ), I'm not > suggesting that whatever a "contrib" project/module/subproject might > become, be a clearinghouse for anything Storm-related. > > I see it as something that is well-vetted by the Storm community, subject > to PPMC review, vote, etc. Entry would require community review, PPMC > review, and in some cases ASF IP clearance/legal review. Anything added > would require some level of commitment from the PPMC/committers to provide > some level of support. > > In other words, nothing "willy-nilly". > > One option could be that any module added require (X > 0) number of > committers to volunteer as "sponsor"s for the module, and commit to > maintaining it. > > That being said, I don't see storm-kafka being any different from anything > else that provides integration points for Storm. > > -Taylor > > > On Feb 25, 2014, at 7:53 PM, Nathan Marz <[email protected]<mailto: > [email protected]>> wrote: > > I'm only +1 for pulling in storm-kafka and updating it. Other projects put > these contrib modules in a "contrib" folder and keep them managed as > completely separate codebases. As it's not actually a "module" necessary > for Storm, there's an argument there for doing it that way rather than via > the multi-module route. > > > On Tue, Feb 25, 2014 at 4:39 PM, Milinda Pathirage <[email protected] > <mailto:[email protected]>> wrote: > Hi Taylor, > > I'm +1 for pulling these external libraries into Apache codebase. This > will certainly benifit Strom community. I also like to contribute to > this process. > > Thanks > Milinda > > On Tue, Feb 25, 2014 at 5:28 PM, P. Taylor Goetz <[email protected] > <mailto:[email protected]>> wrote: > > A while back I opened STORM-206 [1] to capture ideas for pulling in > > "contrib" modules to the Apache codebase. > > > > In the past, we had the storm-contrib github project [2] which > subsequently > > got broken up into individual projects hosted on the stormprocessor > github > > group [3] and elsewhere. > > > > The problem with this approach is that in certain cases it led to code > rot > > (modules not being updated in step with Storm's API), fragmentation > > (multiple similar modules with the same name), and confusion. > > > > A good example of this is the storm-kafka module [4], since it is a > widely > > used component. Because storm-contrib wasn't being tagged in github, a > lot > > of users had trouble reconciling with which versions of storm it was > > compatible. Some users built off specific commit hashes, some forked, > and a > > few even pushed custom builds to repositories such as clojars. With kafka > > 0.8 now available, there are two main storm-kafka projects, the original > > (compatible with kafka 0.7) and an updated fork [5] (compatible with > kafka > > 0.8). > > > > My intention is not to find fault in any way, but rather to point out the > > resulting pain, and work toward a better solution. > > > > I think it would be beneficial to the Storm user community to have > certain > > commonly used modules like storm-kafka brought into the Apache Storm > > project. Another benefit worth considering is the licensing/legal > oversight > > that the ASF provides, which is important to many users. > > > > If this is something we want to do, then the big question becomes what > sort > > governance process needs to be established to ensure that such things are > > properly maintained. > > > > Some random thoughts, questions, etc. that jump to mind include: > > > > What to call these things: "contib modules", "connectors", "integration > > modules", etc.? > > Build integration: I imagine they would be a multi-module submodule of > the > > main maven build. Probably turned off by default and enabled by a maven > > profile. > > Governance: Have one or more committer volunteers responsible for > > maintenance, merging patches, etc.? Proposal process for pulling new > > modules? > > > > > > I look forward to hearing others' opinions. > > > > - Taylor > > > > > > [1] https://issues.apache.org/jira/browse/STORM-206 > > [2] https://github.com/nathanmarz/storm-contrib > > [3] https://github.com/stormprocessor > > [4] https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka > > [5] https://github.com/wurstmeister/storm-kafka-0.8-plus > > > > -- > Milinda Pathirage > > PhD Student | Research Assistant > School of Informatics and Computing | Data to Insight Center > Indiana University > > twitter: milindalakmal > skype: milinda.pathirage > blog: http://milinda.pathirage.org<http://milinda.pathirage.org/> > > > > -- > Twitter: @nathanmarz > http://nathanmarz.com<http://nathanmarz.com/> > > >
