It's not really easier, they've been working on getting this released for 2.5 years or more. What I think will make it easier is having more of a precedence. In the government, it's always easier to say no than yes. Showing that it can be done and done successfully will push them to develop a consistent process. Hopefully in the future it will take less than 2.5 years to go public :)
-Joey On Fri, Sep 2, 2011 at 3:37 PM, Ted Yu <[email protected]> wrote: > Thanks for the update Joey. > May someone close to NSA disclose what may have changed recently that allows > contributing to Open Source eaiser ? > > On Fri, Sep 2, 2011 at 12:30 PM, Joey Echeverria <[email protected]> wrote: > >> To add to what Todd said, I actually worked with those guys for the >> last 3 years and have used Accumulo in production. It's true that it >> would have been better if they had been able to contribute to HBase >> rather than go on their own, but it's not easy to contribute to open >> source, either officially or unofficially when you work at NSA. I >> think there is precedence for competing and/or "duplicate" Apache >> projects, Avro/Thrift and HBase/Cassandra come to mind. I'm mostly >> interested in this project setting a precedent for other work at NSA >> to be developed as open source. >> >> -Joey >> >> On Fri, Sep 2, 2011 at 3:09 PM, Todd Lipcon <[email protected]> wrote: >> > Hey folks, >> > >> > <wearing my Todd hat and not my Cloudera hat!> >> > >> > I've been in touch with this team for the last 18 months or so. >> > They're good people, smart, and have a healthy respect for HBase and >> > our team. Though they haven't contributed code or participated on the >> > lists, I can vouch that they do follow our development and generally >> > do understand HBase as well as what makes their system different. In >> > the context of the incubator proposal, they're trying to explain why >> > their system is different than HBase, and not trying to knock our >> > project. They do borrow our ideas, and in the future we'll be able to >> > borrow some of theirs. Iterator trees, for example, are distinct from >> > coprocessors and have some really nice capabilities which I'm looking >> > forward to adapting into HBase. >> > >> > There are a couple things to keep in mind about the story here: >> > - they first evaluated HBase 3 years ago. HBase at that point was not >> > usable for their application - I think several of us here remember the >> > state of HBase at the time and might have made the same decision. So, >> > they started their own project with an internal team of 5-6 people. >> > - contributing to open source from within the NSA is not easy, for >> > obvious reasons. They've jumped through many hoops to open source >> > this, and we should be thankful for that. Now that they're out in open >> > source land, I think we'll see them collaborating with us much more >> > openly. >> > >> > I for one look forward to working with these folks, and maybe merging >> > the projects some time down the road as the feature lists converge. >> > >> > -Todd >> > >> > On Fri, Sep 2, 2011 at 11:40 AM, Gary Helmling <[email protected]> >> wrote: >> >> Some comments on the proposal and differentiation vs HBase: >> >> >> >> Access Labels: >> >> >> >> The proposal claims that this is "unlikely to be adopted [in HBase]". >> This >> >> is completely untrue. This has been discussed many times in the past in >> >> relation to our security implementation. It's just been deferred at the >> >> moment due to a need to focus on the initial implementation. But it's >> >> certainly viewed as a potentially important feature for a future >> iteration. >> >> Contributions always welcome! >> >> >> >> see HBASE-3435: Provide per-column-qualifier and per-key-value security >> for >> >> HBASE-3025 >> >> >> >> >> >> Iterators: >> >> >> >> What do these provide that RegionObservers don't? I'm speculating since >> the >> >> proposal provides little in the way of details, but if these are >> "unlikely >> >> to be adopted" it's only because coprocessors already offer more >> extensive >> >> functionality. >> >> >> >> >> >> "Flexibility" aka online schema changes and locality groups >> >> >> >> Locality groups seem to be the only meaningful differentiation in this >> >> entire comparison. >> >> >> >> >> >> Testing >> >> >> >> Performance under "some configurations and conditions" and >> unsubstantiated >> >> "greater data integrity" is not meaningful differentiation. >> >> >> >> >> >> Apache Brand >> >> >> >> Claims a relationship with HBase. Is there overlapping code or is this >> just >> >> the duplication of functionality? There's no community relationship >> that >> >> I'm aware of. I haven't seen any of the proposed committers on the >> HBase >> >> user and dev lists to this point, so that doesn't set much of a >> precedent >> >> for community interaction. >> >> >> >> >> >> Overall I see no meaningful differentiation vs HBase as an existing >> project, >> >> no past attempts to interact with the most relevant Apache community, >> and >> >> only an, until now, private "community" of government users. I think >> it's >> >> great that they want to open source this. I don't want to discourage >> that >> >> -- go for it! But I don't see what the benefit is of ASF incubating >> this. >> >> I only see the potential for community fragmentation and market >> confusion >> >> over such closely similar projects. >> >> >> >> >> >> Gary >> >> >> >> >> >> On Fri, Sep 2, 2011 at 11:06 AM, Stack <[email protected]> wrote: >> >> >> >>> See here for the incubator proposal: >> >>> http://wiki.apache.org/incubator/AccumuloProposal >> >>> >> >>> Reactions probably better belong over on the incubator mailing list >> >>> but I thought a discussion here first might be useful developing a >> >>> stance. >> >>> >> >>> Initial reaction, not having seen the code, is that it seems to be >> close to >> >>> HBase; so close, they call HBase out explicitly in their proposal. >> >>> >> >>> The cell based 'access labels' seem like a matter of adding >> >>> an extra field to KV and their Iterators seem like a specialization on >> >>> Coprocessors. The ability to add column families on the fly seems too >> >>> minor a difference to call out especially if online schema edits are >> >>> now (soon) supported. They talk of locality group like functionality >> >>> too -- that >> >>> could be a significant difference. We would have to see the code but >> at >> >>> first blush, differences look small. >> >>> >> >>> Yet another BT implementation further divides this contended space. >> >>> If there were to be an effort integrating HBase into Accumulo or vice >> >>> versa, its likely to distract significantly from project forward motion >> (If >> >>> the Accumulo fellows were interested in integrating the two projects, >> >>> I'd have thought they'd have tried to talk to us before this so thats >> >>> probably not their intent). >> >>> >> >>> On other hand, if their once-secret project is out in the open, we can >> >>> steal the Apache-licensed good bits and.... >> >>> >> >>> What do folks think? >> >>> >> >>> St.Ack >> >>> >> >> >> > >> > >> > >> > -- >> > Todd Lipcon >> > Software Engineer, Cloudera >> > >> >> >> >> -- >> Joseph Echeverria >> Cloudera, Inc. >> 443.305.9434 >> > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
