Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by JimKellerman: http://wiki.apache.org/hadoop/Hbase The comment on the change is: Restructuring Wiki ------------------------------------------------------------------------------ #pragma section-numbers off attachment:hbase_logo_med.gif - = Bigtable-like structured storage for Hadoop HDFS = + = HBase: Bigtable-like structured storage for Hadoop HDFS = - - [[Anchor(links)]] - * HBase source control: https://svn.apache.org/repos/asf/hadoop/hbase/trunk - * [#news News] - * [#background Background] - * [wiki:Hbase/HbaseArchitecture Hbase Architecture] - * [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/javadoc/org/apache/hadoop/hbase/package-summary.html#package_description Getting Started] description hosted inside the HBase javadoc package description or see how to checkout, build and run hbase in about [wiki:Hbase/10Minutes 10 Minutes]. - * [wiki:Hbase/FAQ FAQ] - * [wiki:Hbase/UsingBloomFilters Using Bloom Filters] - * [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/javadoc/org/apache/hadoop/hbase/package-summary.html HBase API Docs] built as part of Hadoop nightlies - * Hbase and Performance - * [wiki:Hbase/PerformanceEvaluation Tools for evaluating HBase performance and scalability] - * There are setup instructions and a JMeter Test Plan in [https://issues.apache.org/jira/browse/HADOOP-2625 HADOOP-2625] - * [:Hbase/HbaseRTDS]: Discuss the evaluation of Hbase - * HBase discussion happens up on the HBase mailing lists: - * User information: [[MailTo(hbase-user AT SPAMFREE hadoop DOT apache DOT org)]] - * Developer information: [[MailTo(hbase-dev AT SPAMFREE hadoop DOT apache DOT org)]] - * See also the hadoop mailing lists [http://hadoop.apache.org/core/mailing_lists.html] - * Hbase IRC channel is #hbase at irc.freenode.net. - * [:Hbase/HbaseRest] HBase REST-gateway spec. - * [:Hbase/ThriftApi] HBase Thrift gateway discussion and spec. - * [:HBase/HBasePresentations] HBase presentations - * [:Hbase/MapReduce] Using HBase !MapReducing - * [:Hbase/IssuePriorityGuidelines] How to rate the priority of your issues in JIRA. - * [:Hbase/HbaseShell:Hbase Shell], a Query Language Shell for Hadoop + Hbase - * [:Hbase/Jython] Accessing HBase from Jython - * [:Hbase/PoweredBy: PoweredBy], a list of sites and applications powered by Hbase - * Planning: - * [:Hbase/Plan-0.17: Plan for Hbase 0.17] - - [[Anchor(news)]] - == NEWS: == - * HBase moves to new SVN and JIRA -- ''2008/02/04'' - * First [http://www.eventbrite.com/event/85834734 Hbase meetup]. Hosted by rapleaf -- ''2007/12/18'' - * Paul Saab uploads 1.3B (small) two-family rows into a 24 node hbase cluster -- ''2007/12/15'' - * Extensive refactoring of locking and addition of first version of a REST interface -- ''2007/11/25'' - * First working release of hbase is available as part of the hadoop-0.15.0 release. See [http://svn.apache.org/viewvc/lucene/hadoop/branches/branch-0.15/src/contrib/hbase/CHANGES.txt?view=markup CHANGES.txt] for release content. [http://aa0-000-12.u.powerset.com:60010/hql.jsp?q=select+anchor%3Aanchor_text+from+enwiki%3B Download]. - * Cluster behavior has been much improved. The master, rather than the splitting region server host, now rules where the daughter splits are deployed. A simple formula has been added to spread region load evenly. Splits have been made near-instantaneous and compaction has been reworked so neither block updates for extended periods of time. -- ''Added 2007/08/16'' - * Support for row and filter columns. - * A simple [wiki:Hbase/HbaseShell shell] for manipulating HBase tables contributed by Edward Yoon. -- ''Added 2007/07/10''' - * Map/Reduce connector for HBase - contributed by Vuk Ercegovac -- ''Added 2007/06/30'' - * Scripts to start and stop a hbase cluster have been added. See ${HBASE_HOME}/bin. List cluster participants in ${HBASE_HOME}/conf/regionservers file). -- ''Added 2007/06/21'' - * A script to run distributed clients executing the Performance Evaluation tests described in the Google Bigtable paper has been added and tested to completion running against a small cluster of 4 region servers. See [wiki:Hbase/PerformanceEvaluation Tools for evaluating HBase performance and scalability] -- ''Added 2007/06/21'' - * It is now possible to add or delete column families after a table exists. Before either of these operations the table being updated must be taken off-line (disabled) -- ''Added 2007/05/30'' - * Data compression is available on a per-column family basis. -- ''Added 2007/05/30'' The options are: - * no compression - * record level compression - * block level compression - * HBase now has its own component in the [https://issues.apache.org/jira/browse/HADOOP Hadoop Jira]. Bug reports, contributions, etc. should be tagged with the component '''contrib/hbase'''. - * HBase is being updated frequently. The latest code can always be found in the [http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/ trunk of the Hadoop svn tree]. - - See the [https://issues.apache.org/jira/browse/HBASE HBase section of JIRA] for current set of outstanding issues and recent fixes. - - - [[Anchor(background)]] - == Background == Google's [http://labs.google.com/papers/bigtable.html Bigtable], a distributed storage system for structured data, is a very effective mechanism for storing very large amounts of data in a distributed environment. Just as Bigtable leverages the distributed data storage provided by the [http://labs.google.com/papers/gfs.html Google File System], - Hbase will provide Bigtable-like capabilities on top of Hadoop. + HBase provides Bigtable-like capabilities on top of Hadoop. - Data is organized into tables, rows and columns. An Iterator-like interface is available + Data is organized into tables, rows and columns. An Iterator-like interface - for scanning through a row range (and of course there is an ability to + is available for scanning through a row range (and of course there is the - retrieve a column value for a specific key). + ability to retrieve a column value for a specific key). Any particular column may have multiple values for the same row key. A secondary key can be provided to select a particular value or an Iterator can be set up to scan through the key-value pairs for that column + given a specific row key. - given a specific row key. See [wiki:Hbase/HbaseArchitecture Hbase Architecture] - to learn more about Hbase. - [[Anchor(rationale)]] - === Rationale === + == General Information == + * [wiki:Hbase/HbaseArchitecture HBase Architecture] + * [wiki:Hbase/FAQ FAQ] + * Support: + * HBase IRC channel #hbase at irc.freenode.net. + * HBase mailing lists: + * User information: [[MailTo(hbase-user AT SPAMFREE hadoop DOT apache DOT org)]] + * Developer information: [[MailTo(hbase-dev AT SPAMFREE hadoop DOT apache DOT org)]] + * See also the hadoop mailing lists [http://hadoop.apache.org/core/mailing_lists.html] + * HBase [:HBase/News: news] and [:HBase/HBasePresentations: presentations] + * [:Hbase/PoweredBy: PoweredBy], a list of sites and applications powered by HBase - Both Google's GFS and Hadoop's HDFS provide a mechanism to - reliably store large amounts of data. However, there is not really a - mechanism for organizing the data and accessing only the parts that - are of interest to a particular application. + == User Documentation == + * [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/javadoc/org/apache/hadoop/hbase/package-summary.html#package_description Getting Started] + * [wiki:Hbase/10Minutes How to checkout, build and run hbase in about 10 Minutes]. + * [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/javadoc/org/apache/hadoop/hbase/package-summary.html HBase API Docs] + * [:Hbase/HbaseShell:HBase Shell], a Query Language Shell for Hadoop + HBase + * [:Hbase/Jython: Jython interface to HBase] + * [:Hbase/HbaseRest: REST gateway specification for HBase] + * [:Hbase/ThriftApi: Thrift gateway specification for HBase] + * [:Hbase/MapReduce: Using HBase with Hadoop !MapReduce] + * [wiki:Hbase/UsingBloomFilters Using Bloom Filters] + * HBase and Performance + * [wiki:Hbase/PerformanceEvaluation: Tools for evaluating HBase performance and scalability] + * There are setup instructions and a JMeter Test Plan in [https://issues.apache.org/jira/browse/HADOOP-2625 HADOOP-2625] + * [:Hbase/HbaseRTDS: A performance evaluation of HBase] - Bigtable (and Hbase) provide a means for organizing and efficiently - accessing these large data sets. + == Developer Documentation == + * Roadmaps + * [:Hbase/Plan-0.17: Roadmap for HBase 0.2] + * [:Hbase/HowToContribute How to contribute] + * [:Hbase/HowToCommit How to commit] + * [:Hbase/IssuePriorityGuidelines How to rate the priority of your issues in JIRA] - [[Anchor(goals)]] === Goals === Design (and subsequently implement) a structured storage system as similar to Google's Bigtable as possible for the Hadoop environment. - [[Anchor(nongoals)]] ==== Non-Goals ==== * Gratuitous changes that are essentially "re-inventing the wheel" or are the result of "not invented here". + * For the near term features outside those outlined by the [http://labs.google.com/papers/bigtable.html Bigtable paper] * Premature optimization. Once there is a working version, the system will be profiled for hot spots. - - [[Anchor(contributors)]] - == Initial Contributors == - - * Mike Cafarella (who wrote the initial code base) - * JimKellerman [[MailTo(jim AT SPAMFREE powerset DOT com)]] - * Michael Stack [[MailTo(stack AT SPAMFREE powerset DOT com)]] - - [[Anchor(comments)]] - == Comments == - - Please add comments related to the project goals and process below. - Architectural comments should be posted on same page as the portion of - the architecture to which the comment is directed. Thank you. -
