Some thoughts to re-ignite this thread: The raccumulo project has some of it's code written in the language R, but does not borrow any code from the R codebase and as such is not a derivative work.
Unless anybody can think of a way in which R's own licensing could become a concern, potential license conflicts might be a dead issue? Some background: R is a statistics domain specific language used mostly for statistics research. http://www.r-bloggers.com/r-usage-skyrocketing-rexer-poll/ "*R <http://www.revolutionanalytics.com/what-is-open-source-r/> is the most popular data mining tool*, used at least occasionally by 70% of those polled. This popularity holds amongst all of the subgroups in the survey as well: R remains the most-used tool amongst corporate data miners (70%), consulting data miners (73%), academic data miners (75%) and nonprofit/NGO/government data miners (67%). And while the average data miner reports using five software tools, R is also the most popular primary tool in the survey, at 24% overall. " The raccumulo code base was written for a defense customer, but has since had investment from several DARPA programs and DHS because of the importance of both accumulo and r. They practically go together like peanut butter and jelly (I just made that up). Projects analagous to raccumulo exist for HBase (rhbase). The primary developer Phil Grim has signed an ICLA that I'm going to send off tomorrow pending our company's contracts department's approval. Same with company level CCLA, complete and pending final review. Phil, Aaron, and Myself as listed as representatives on it. Insofar as observations about lack of committership: Phil has been willing to share his code for a while and wants to keep contributing. https://issues.apache.org/jira/browse/SQOOP-767 https://issues.apache.org/jira/browse/ACCUMULO-141 discussion about this topic here: http://www.mail-archive.com/[email protected]/msg10665.html The other developer, Aaron is listed as a previous contributor to accumulo: http://accumulo.apache.org/people.html More about what's going on at the company: https://twitter.com/DataTactics More about DARPA XData (one of the programs of interest): http://www.darpa.mil/Our_Work/I2O/Programs/XDATA.aspx The customer project includes a charter to contribute to open source: "XDATA plans to release open-source software toolkits to enable collaboration among the applied mathematics, computer science and data visualization communities." As a company we'd be happy to just keep hosting the code on our Github page, but I think we'd rather see it be included closer to the accumulo project as mentioned previously. Given the momentum of R, the interest of DARPA and others, I think the benefits outweigh he risks. There's an extremely small chance of an orphaned project and even then as a 200+ person company there's somebody you can blame if it does become a problem. We have a twitter account and github page people can go to with help requests or fixes. We are interested in hearing more about how to best continue. I'll send a note when CCLA and ICLAs are fully executed. On Tue, Oct 29, 2013 at 5:52 AM, Steve Loughran <[email protected]>wrote: > On 29 October 2013 00:02, Christopher <[email protected]> wrote: > > > +1 for it's own repo... but due to licensing concerns of the R > > dependency, and lack of committership of the original developers, I'm > > not sure it makes much sense for Accumulo to adopt it as a sub-project > > by importing it, which would mean taking on the responsibility of > > maintaining it. > > > > That's the eternal problem with contributed code. Close to the project: you > can keep an eye on it, but then people expect it to work and blame you if > it can't. But at the same time, those contributions build up your project's > functionality. > > One rule that I've found works is: never accept code that you can't test > yourself. > > If it needs some non-standard filesystem, lots of pre-installed binaries or > human intervention, its not something that you can hook up to a CI build, > or test in a release process -so it will be broken almost from the outset. > > If you can test it yourself, even if if you have to pay a few cents of S3 > or openstack cluster time, then it is something you could consider > releasing as "tested". Otherwise, it'll just become a maintenance and > support nightmare in years to come. > > In Hadoop core some of the contribs/ -the schedulers - were pulled in, but > other contrib stuff is now out -the general policy being "no orphaned works > in the core codebase". > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >
