Seems similar, see the proposal, there are a few sections that call out the differences. (search for "hbase")
On Fri, Sep 2, 2011 at 9:45 AM, Mahadev Konar <[email protected]> wrote: > Nice! > Is this related to HBase? Or similar to it? > > mahadev > > On Fri, Sep 2, 2011 at 9:27 AM, Patrick Hunt <[email protected]> wrote: >> FYI, another project using ZK -- woot!!! (note that they have their >> own WAL - perhaps a good application for BookKeeper?) >> >> ---------- Forwarded message ---------- >> From: Billie J Rinaldi <[email protected]> >> Date: Fri, Sep 2, 2011 at 8:45 AM >> Subject: [PROPOSAL] Accumulo for the Apache Incubator >> To: [email protected] >> >> >> Greetings, >> >> I would like to propose Accumulo to be an Apache Incubator project. >> Accumulo is a distributed key/value store that provides expressive >> cell-level access labels and a server-side programming mechanism that >> can modify key/value pairs at various points in the data management >> process. It is based on Google's BigTable design and runs over Apache >> Hadoop and Zookeeper. >> >> Here is a link to the proposal in the Incubator wiki: >> http://wiki.apache.org/incubator/AccumuloProposal >> >> I've also pasted the initial contents below. >> >> Thanks, >> Billie Rinaldi >> >> >> = Accumulo Proposal = >> >> == Abstract == >> Accumulo is a distributed key/value store that provides expressive, >> cell-level access labels. >> >> == Proposal == >> Accumulo is a sorted, distributed key/value store based on Google's >> BigTable design. It is built on top of Apache Hadoop, Zookeeper, and >> Thrift. It features a few novel improvements on the BigTable design >> in the form of cell-level access labels and a server-side programming >> mechanism that can modify key/value pairs at various points in the >> data management process. >> >> == Background == >> Google published the design of BigTable in 2006. Several other open >> source projects have implemented aspects of this design including >> HBase, CloudStore, and Cassandra. Accumulo began its development in >> 2008. >> >> == Rationale == >> There is a need for a flexible, high performance distributed key/value >> store that provides expressive, fine-grained access labels. The >> communities we expect to be most interested in such a project are >> government, health care, and other industries where privacy is a >> concern. We have made much progress in developing this project over >> the past 3 years and believe both the project and the interested >> communities would benefit from this work being openly available and >> having open development. >> >> == Current Status == >> >> === Meritocracy === >> We intend to strongly encourage the community to help with and >> contribute to the code. We will actively seek potential committers >> and help them become familiar with the codebase. >> >> === Community === >> A strong government community has developed around Accumulo and >> training classes have been ongoing for about a year. Hundreds of >> developers use Accumulo. >> >> === Core Developers === >> The developers are mainly employed by the National Security Agency, >> but we anticipate interest developing among other companies. >> >> === Alignment === >> Accumulo is built on top of Hadoop, Zookeeper, and Thrift. It builds >> with Maven. Due to the strong relationship with these Apache >> projects, the incubator is a good match for Accumulo. >> >> == Known Risks == >> === Orphaned Products === >> There is only a small risk of being orphaned. The community is >> committed to improving the codebase of the project due to its >> fulfilling needs not addressed by any other software. >> >> === Inexperience with Open Source === >> The codebase has been treated internally as an open source project >> since its beginning, and the initial Apache committers have been >> involved with the code for multiple years. While our experience with >> public open source is limited, we do not anticipate difficulty in >> operating under Apache's development process. >> >> === Homogeneous Developers === >> The committers have multiple employers and it is expected that >> committers from different companies will be recruited. >> >> === Reliance on Salaried Developers === >> The initial committers are all paid by their employers to work on >> Accumulo and we expect such employment to continue. Some of the >> initial committers would continue as volunteers even if no longer >> employed to do so. >> >> === Relationships with Other Apache Products === >> Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang, >> -net, -io, -jci, -collections, -configuration, -logging, and -codec. >> >> === Relationship to HBase === >> Accumulo and HBase are both based on the design of Google's BigTable, >> so there is a danger that potential users will have difficulty >> distinguishing the two or that they will not see an incentive in >> adopting Accumulo. There are a few key areas in which Accumulo >> differs from HBase. Some of the desired features of Accumulo could be >> incorporated into HBase, however the most important of these may be >> unlikely to be adopted (see cell-level access labels and iterators >> below). It is a possibility that the codebases will ultimately >> converge, but the number of differences at the current time warrants a >> separate project for Accumulo. >> >> ==== Access Labels ==== >> Accumulo has an additional portion of its key that sorts after the >> column qualifier and before the timestamp. It is called column >> visibility and enables expressive cell-level access control. >> Authorizations are passed with each query to control what data is >> returned to the user. The column visibilities are boolean AND and OR >> combinations of arbitrary strings (such as "(A&B)|C") and >> authorizations are sets of strings (such as {C,D}). >> >> ==== Iterators ==== >> Accumulo has a novel server-side programming mechanism that can modify >> the data written to disk or returned to the user. This mechanism can >> be configured for any of the scopes where data is read from or written >> to disk. It can be used to perform joins on data within a single >> tablet. >> >> ==== Flexibility ==== >> HBase requires the user to specify the set of column families to be >> used up front. Accumulo places no restrictions on the column >> families. Also, each column family in HBase is stored separately on >> disk. Accumulo allows column families to be grouped together on disk, >> as does BigTable. This enables users to configure how their data is >> stored, potentially providing improvements in compression and lookup >> speeds. It gives Accumulo a row/column hybrid nature, while HBase is >> currently column-oriented. >> >> ==== Testing ==== >> Accumulo has testing frameworks that have resulted in its achieving a >> high level of correctness and performance. We have observed that >> under some configurations and conditions Accumulo will outperform >> HBase and provide greater data integrity. >> >> ==== Logging ==== >> HBase uses a write-ahead log on the Hadoop Distributed File System. >> Accumulo has its own logging service that does not depend on >> communication with the HDFS NameNode. >> >> ==== Storage ==== >> Accumulo has a relative key file format that improves compression. >> >> ==== Areas in which HBase features improvements over Accumulo ==== >> in memory tables, upserts, coprocessors, connections to other projects >> such as Cascading and Pig >> >> === Expectations === >> There is a risk that Accumulo will be criticized for not providing >> adequate security. The access labels in Accumulo do not in themselves >> provide a complete security solution, but are a mechanism for labeling >> each piece of data with the authorizations that are necessary to see >> it. >> >> === Apache Brand === >> Our interest in releasing this code as an Apache incubator project is >> due to its strong relationship with other Apache projects, i.e. >> Hadoop, Zookeeper, and HBase. >> >> == Documentation == >> There is not currently documentation about Accumulo on the web, but a >> fair amount of documentation and training materials exists and will be >> provided on the Accumulo wiki at apache.org. Also, a paper discussing >> YCSB results for Accumulo will be presented at the 2011 Symposium on >> Cloud Computing. >> >> == Initial Source == >> Accumulo has been in development since spring 2008. There are >> hundreds of developers using it and tens of developers have >> contributed to it. The core codebase consists of 200,000 lines of >> code (mainly Java) and 100s of pages of documentation. There are also >> a few projects built on top of Accumulo that may be added to its >> contrib in the future. These include support for Hive, Matlab, YCSB, >> and graph processing. >> >> == Source and Intellectual Property Submission Plan == >> Accumulo core code, examples, documention, and training materials will >> be submitted by the National Security Agency. >> >> We will also be soliciting contributions of further plugins from MIT >> Lincoln Labs, Carnegie Mellon University, and others. >> >> Accumulo has been developed by a mix of government employees and >> private companies under government contract. Material developed by >> government employees is in the public domain and no U.S. copyright >> exists in works of the federal government. For the contractor >> developed material in the initial submission, the U.S. Government has >> sufficient authority per the ICLA from the copyright owner to >> contribute the Accumulo code to the incubator. >> >> There has been some discussion regarding accepting contributions from >> US Government sources on >> [https://issues.apache.org/jira/browse/LEGAL-93 LEGAL-93]. We propose >> that the NSA will sign an ICLA/CCLA if that document could be slightly >> modified to explicitly address copyright in works of government >> employees. Specifically, we propose that the definition of “You” be >> modified to include “the copyright owner, the owner of a Contribution >> not subject to copyright, or legal entity authorized by the copyright >> owner that is making this Agreement.” In addition, section 2, the >> copyright license grant be modified after “You hereby grant” that >> either states “to the extent authorized by law” or “to the extent >> copyright exists in the Contribution.” These changes will permit US >> Government employee developed work to be included. >> >> One proposed solution is to form a Collaborative Research and >> Development Agreement (CRADA) between the Apache Software Foundation >> and the US Government, but this will not solve the underlying problem >> that U.S. law does not grant copyright to works of government >> employees. At this time a CRADA is not necessary but should it be >> determined that a CRADA is necessary, we would like to work through >> that process during the incubation phase of Accumulo rather than >> before acceptance as this may take time to enter into an agreement. >> >> == External Dependencies == >> jetty (Apache and EPL), jline (BSD), jfreechart (LGPL), jcommon >> (LGPL), slf4j (MIT), junit (CPL) >> >> == Cryptography == >> none >> >> == Required Resources == >> * Mailing Lists >> * accumulo-private >> * accumulo-dev >> * accumulo-commits >> * accumulo-user >> >> * Subversion Directory >> * https://svn.apache.org/repos/asf/incubator/accumulo >> >> * Issue Tracking >> * JIRA Accumulo (ACCUMULO) >> >> * Continuous Integration >> * Jenkins builds on https://builds.apache.org/ >> >> * Web >> * http://incubator.apache.org/accumulo/ >> * wiki at http://wiki.apache.org or http://cwiki.apache.org >> >> == Initial Committers == >> * Aaron Cordova (aaron at cordovas dot org) >> * Adam Fuchs (adam.p.fuchs at ugov dot gov) >> * Eric Newton (ecn at swcomplete dot com) >> * Billie Rinaldi (billie.j.rinaldi at ugov dot gov) >> * Keith Turner (keith.turner at ptech-llc dot com) >> * John Vines (john.w.vines at ugov dot gov) >> * Chris Waring (christopher.a.waring at ugov dot gov) >> >> == Affiliations == >> * Aaron Cordova, The Interllective >> * Adam Fuchs, National Security Agency >> * Eric Newton, SW Complete Incorporated >> * Billie Rinaldi, National Security Agency >> * Keith Turner, Peterson Technology LLC >> * John Vines, National Security Agency >> * Chris Waring, National Security Agency >> >> == Sponsors == >> * Champion: Doug Cutting >> * Nominated Mentors: Benson Margulies, ?, ? >> * Sponsoring Entity: Apache Incubator >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >
