RE: HBase implementation question

Jim Kellerman Sun, 30 Dec 2007 00:02:41 -0800

There are some good diagrams in the most recent presentation
posted on the Wiki:


http://wiki.apache.org/lucene-hadoop/HBase/HBasePresentations


However, I'll provide a brief summary here.

HDFS files are indeed write once. HBase uses a Hadoop MapFile
( org.apache.hadoop.io.MapFile.java ) for storage, and a
SequenceFile for its redo log (note that the latter is a
current weakness in HBase as files don't currently persist
until they are closed (see HADOOP-1700)).

There are basically two operations, reads and writes. When
a write is received by the server, it first writes the
change to the redo log. The change is then stored in memory.

Periodically, the memory cache is flushed to disk creating
a new MapFile. Files are created on a per column basis so
any particular MapFile contains entries only for a particular
column.

When the number of MapFiles for a column exceeds a configurable
threshold, a background thread is started that merges the
existing MapFiles into one. This operation is called compaction.
Writes may continue while the compaction is in progress, and
may cause new MapFiles to be created if the cache is flushed
to disk. Any new MapFiles created after the compaction starts
will not be a part of the current compaction. Reads may also
continue during a compaction because all the files that currently
exist are immutable. At the end of the compaction the new file
created by merging several files together will be put in place
of the files that were a part of the compaction by temporarily
locking the column, moving the new file into place, and deleting
the old files. This takes very little time, so that read or
write operations on the column are stopped only briefly.

Reads are probably a bit more complicated than writes. A read
operation first checks the cache and may satisfy the request
directly from the cache. If not, the operation checks the
newest MapFile for the data, then the next to newest, ...,
to the oldest stopping when the requested data has been
retrieved. Because a random read (or even a sequential read
that is not a scan) can end up checking multiple files
for data they are considerably slower than either writes and
sequential scans (think of a scan as working with a cursor
in a traditional database).

There are other complicating factors like how a table obtains
more storage as it grows, but the above provides the basic
idea.

Hope this helps.
---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
> Behalf Of James D Sadler
> Sent: Saturday, December 29, 2007 9:17 PM
> To: hadoop-user@lucene.apache.org
> Subject: HBase implementation question
>
> Hi All,
>
> I'm interested in the architecture of HBase, in particular
> how it is implemented on top of Hadoop DFS.  I understand
> that HDFS files are write once: after they are initially
> created they are for all intents and purpose immutable.  This
> being the case, how does HBase implement its table storage on
> top of such a file system?  Do updates to an HBase table
> cause new versions of the file backing the table to be
> created (obviously not!)?  Or have I completely misunderstood
> how HDFS works (most likely) ?
>
> Best Regards,
>
> James.
>

RE: HBase implementation question

Reply via email to