[Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman

Apache Wiki Wed, 30 May 2007 10:26:13 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

------------------------------------------------------------------------------
  comments, but please make them stand out by bolding or underlining
  them. Thanks!
  
- '''NEWS:'''
+ '''NEWS:''' (updated 2007/05/30)
   1. HBase is being updated frequently. The latest code can always be found in 
the [http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/ trunk 
of the Hadoop svn tree]. 
   1. HBase now has its own component in the 
[https://issues.apache.org/jira/browse/HADOOP Hadoop Jira]. Bug reports, 
contributions, etc. should be tagged with the component '''contrib/hbase'''.
+  1. It is now possible to add or delete column families after a table exists. 
Before either of these operations the table being updated must be taken 
off-line (disabled).
+  1. Data compression is available on a per-column family basis. The options 
are:
+   * no compression
+   * record level compression
+   * block level compression
  
  = Table of Contents =
  
@@ -164, +169 @@

  [[Anchor(client)]]
  = HClient Client API =
  
+ See the Javadoc for 
[http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HClient.html
 HClient]
- {{{
- public class HClient implements HConstants {
-   /** Creates a new HClient */
-   public HClient(Configuration conf);
- 
-   /** Creates a new table */
-   public synchronized void createTable(HTableDescriptor desc) throws 
IOException;
- 
-   /** Deletes a table */
-   public synchronized void deleteTable(Text tableName) throws IOException;
- 
-   /** Shut down an HBase instance */
-   public synchronized void shutdown() throws IOException;
- 
-   /** Open a table for subsequent access */
-   public synchronized void openTable(Text tableName) throws IOException;
- 
-   /** Close down the client */
-   public synchronized void close() throws IOException;
- 
-   /**
-    * List all the userspace tables.  In other words, scan the META table.
-    *
-    * If we wanted this to be really fast, we could implement a special
-    * catalog table that just contains table names and their descriptors.
-    * Right now, it only exists as part of the META table's region info.
-    */
-   public synchronized HTableDescriptor[] listTables() throws IOException;
-   
-   /** Get a single value for the specified row and column */
-   public byte[] get(Text row, Text column) throws IOException;
-  
-   /** Get the specified number of versions of the specified row and column */
-   public byte[][] get(Text row, Text column, int numVersions) throws 
IOException;
-   
-   /** 
-    * Get the specified number of versions of the specified row and column with
-    * the specified timestamp.
-    */
-   public byte[][] get(Text row, Text column, long timestamp, int numVersions) 
throws IOException;
- 
-   /** Get all the data for the specified row */
-   public LabelledData[] getRow(Text row) throws IOException;
- 
-   /** 
-    * Get a scanner on the current table starting at the specified row.
-    * Return the specified columns.
-    */
-   public synchronized HScannerInterface obtainScanner(Text[] columns, Text 
startRow) throws IOException;
- 
-   /** Start an atomic row insertion or update */
-   public long startUpdate(Text row) throws IOException;
-   
-   /** Change a value for the specified column */
-   public void put(long lockid, Text column, byte val[]) throws IOException;
-   
-   /** Delete the value for a column */
-   public void delete(long lockid, Text column) throws IOException;
-   
-   /** Abort a row mutation */
-   public void abort(long lockid) throws IOException;
-   
-   /** Finalize a row mutation */
-   public void commit(long lockid) throws IOException;
- }
- }}}
  
  [[Anchor(scanner)]]
  == Scanner API ==
  
- To obtain a scanner, open the table, and use obtainScanner.
+ To obtain a scanner, 
[http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HClient.html#openTable(org.apache.hadoop.io.Text)
 open the table], and use 
[http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HClient.html#obtainScanner(org.apache.hadoop.io.Text%5B%5D,%20org.apache.hadoop.io.Text)
 obtainScanner].
  
+ Then use the 
[http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HScannerInterface.html
 scanner API]
- {{{
- public interface HScannerInterface {
-   public boolean next(HStoreKey key, TreeMap<Text, byte[]> results) throws 
IOException;
-   public void close() throws IOException;
- }
- }}}
  
  [[Anchor(hregion)]]
  = HRegion (Tablet) Server =
@@ -423, +358 @@

  Consequently each row in the META and ROOT tables has three members of
  the "info:" column family:
  
+  1. '''info:regioninfo''' contains a serialized 
[http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HRegionInfo.html
 HRegionInfo object]
+  1. '''info:server''' contains a serialized string which is the output from 
[http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HServerAddress.html#toString()
 HServerAddress.toString()]. This string can be supplied to one of the 
[http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HServerAddress.html#HServerAddress(java.lang.String)
 HServerAddress constructors].
-  1. '''info:regioninfo''' contains a serialized H!RegionInfo object which 
contains:
-   * regionid
-   * start key
-   * end key
-   * the table descriptor (a serialized H!TableDescriptor)
-   * the region name
-  1. '''info:server''' contains a serialized string which is the server name, 
a ":" and the port number for the H!RegionServer serving the region
   1. '''info:startcode''' a serialized long integer generated by the 
H!RegionServer when it starts. The H!RegionServer sends this start code to the 
master so the master can determine if the server information in the META and 
ROOT regions is stale.
  
  Thus, a client does not need to contact the HMaster after it learns
@@ -460, +390 @@

  [[Anchor(status)]]
  = Current Status =
  
- As of this writing, there is just shy of 9000 lines of code in 
+ As of this writing (2007/05/30), there are approximately 11,500 lines of code 
in 
  "src/contrib/hbase/src/java/org/apache/hadoop/hbase/" directory on the Hadoop 
SVN trunk.
  
- There are also about 2500 lines of test cases.
+ There are also about 2800 lines of test cases.
  
  All of the single-machine operations (safe-committing, merging,
  splitting, versioning, flushing, compacting, log-recovery) are
@@ -473, +403 @@

  HClient) are in the process of being debugged. And work is in progress to 
create scripts that will launch the HMaster and H!RegionServer on a Hadoop 
cluster.
  
  Other related features and TODOs:
+  1. Single-machine log reconstruction works great, but distributed log 
recovery is not yet implemented. 
-  1. Single-machine log reconstruction works great, but distributed log 
recovery is not yet implemented. This is relatively easy, involving just a sort 
of the log entries, placing the shards into the right DFS directories
-  1. Data compression is not yet implemented, but there is an obvious place to 
do so in the HStore.
   1. We need easy interfaces to !MapReduce jobs, so they can scan tables. We 
have been contacted by Vuk Ercegovac [[MailTo(vercego AT SPAMFREE us DOT ibm 
DOT com)]] of IBM Almaden Research who expressed an interest in working on an 
HBase interface to  Hadoop map/reduce.
   1. Vuk Ercegovac also pointed out that keeping HBase HRegion edit logs in 
HDFS is currently flawed.  HBase writes edits to logs and to a memcache.  The 
'atomic' write to the log is meant to serve as insurance against abnormal 
!RegionServer exit: on startup, the log is rerun to reconstruct an HRegion's 
last wholesome state. But files in HDFS do not 'exist' until they are cleanly 
closed -- something that will not happen if !RegionServer exits without running 
its 'close'.
   1. The HMemcache lookup structure is relatively inefficient

[Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman

Reply via email to