There are some good diagrams in the most recent presentation posted on the Wiki:
http://wiki.apache.org/lucene-hadoop/HBase/HBasePresentations However, I'll provide a brief summary here. HDFS files are indeed write once. HBase uses a Hadoop MapFile ( org.apache.hadoop.io.MapFile.java ) for storage, and a SequenceFile for its redo log (note that the latter is a current weakness in HBase as files don't currently persist until they are closed (see HADOOP-1700)). There are basically two operations, reads and writes. When a write is received by the server, it first writes the change to the redo log. The change is then stored in memory. Periodically, the memory cache is flushed to disk creating a new MapFile. Files are created on a per column basis so any particular MapFile contains entries only for a particular column. When the number of MapFiles for a column exceeds a configurable threshold, a background thread is started that merges the existing MapFiles into one. This operation is called compaction. Writes may continue while the compaction is in progress, and may cause new MapFiles to be created if the cache is flushed to disk. Any new MapFiles created after the compaction starts will not be a part of the current compaction. Reads may also continue during a compaction because all the files that currently exist are immutable. At the end of the compaction the new file created by merging several files together will be put in place of the files that were a part of the compaction by temporarily locking the column, moving the new file into place, and deleting the old files. This takes very little time, so that read or write operations on the column are stopped only briefly. Reads are probably a bit more complicated than writes. A read operation first checks the cache and may satisfy the request directly from the cache. If not, the operation checks the newest MapFile for the data, then the next to newest, ..., to the oldest stopping when the requested data has been retrieved. Because a random read (or even a sequential read that is not a scan) can end up checking multiple files for data they are considerably slower than either writes and sequential scans (think of a scan as working with a cursor in a traditional database). There are other complicating factors like how a table obtains more storage as it grows, but the above provides the basic idea. Hope this helps. --- Jim Kellerman, Senior Engineer; Powerset > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of James D Sadler > Sent: Saturday, December 29, 2007 9:17 PM > To: hadoop-user@lucene.apache.org > Subject: HBase implementation question > > Hi All, > > I'm interested in the architecture of HBase, in particular > how it is implemented on top of Hadoop DFS. I understand > that HDFS files are write once: after they are initially > created they are for all intents and purpose immutable. This > being the case, how does HBase implement its table storage on > top of such a file system? Do updates to an HBase table > cause new versions of the file backing the table to be > created (obviously not!)? Or have I completely misunderstood > how HDFS works (most likely) ? > > Best Regards, > > James. >