[Lucene-hadoop Wiki] Trivial Update of "Hbase/HbaseArchitecture" by stack

Apache Wiki Thu, 16 Aug 2007 08:31:25 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by stack:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

The comment on the change is:
Removed HClient references.  Removed from the TODO list implemented items.

------------------------------------------------------------------------------
  entries. 
  
  [[Anchor(client)]]
- = HClient Client API =
+ = Client API =
  
- See the Javadoc for 
[http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HClient.html
 HClient]
+ See the Javadoc for 
[http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HTable.html
 HTable] and 
[http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HBaseAdmin.html
 HBaseAdmin]
+ 
  
  [[Anchor(scanner)]]
  == Scanner API ==
@@ -381, +382 @@

  [[Anchor(status)]]
  = Current Status =
  
- As of this writing (2007/06/30), there are approximately 16,500 lines of code 
in 
+ As of this writing (2007/08/16), there are approximately 27,000 lines of code 
in 
  "src/contrib/hbase/src/java/org/apache/hadoop/hbase/" directory on the Hadoop 
SVN trunk.
  
- There are also about 4000 lines of test cases.
+ There are also about 7200 lines of test cases.
  
  All of the single-machine operations (safe-committing, merging,
  splitting, versioning, flushing, compacting, log-recovery) are
  complete, have been tested, and seem to work great.
  
+ The multi-machine stuff (the HMaster and the H!RegionServer) are actively 
being enhanced and debugged.
- The multi-machine stuff (the HMaster, the H!RegionServer, and the
- HClient) are actively being enhanced and debugged.
  
  Other related features and TODOs:
   1. Vuk Ercegovac [[MailTo(vercego AT SPAMFREE us DOT ibm DOT com)]] of IBM 
Almaden Research pointed out that keeping HBase HRegion edit logs in HDFS is 
currently flawed.  HBase writes edits to logs and to a memcache.  The 'atomic' 
write to the log is meant to serve as insurance against abnormal !RegionServer 
exit: on startup, the log is rerun to reconstruct an HRegion's last wholesome 
state. But files in HDFS do not 'exist' until they are cleanly closed -- 
something that will not happen if !RegionServer exits without running its 
'close'.
   1. The HMemcache lookup structure is relatively inefficient
-  1. File compaction is relatively slow; we should have a more conservative 
algorithm for deciding when to apply compaction.  Same for region splits.
-  1. For the getFull() operation, use of Bloom filters would speed things up 
(See [https://issues.apache.org/jira/browse/HADOOP-1415 HADOOP-1415])
   1. Implement some kind of block caching in HRegion. While the DFS isn't 
hitting the disk to fetch blocks, HRegion is making IPC calls to DFS (via 
!MapFile)
-  1. Investigate possible performance problem or memory management issue 
related to random reads. As more and more random reads are done, performance 
slows down and the memory footprint increases (I see OOMEs running randomRead 
test -- stack).
+  1. Investigate possible performance problem or memory management issue 
related to random reads. As more and more random reads are done, performance 
slows down and the memory footprint increases
   1. Profile.  Bulk of time seems to be spent RPC'ing.  Improve RPC or amend 
how hbase uses RPC.
  
  [[Anchor(comments)]]

[Lucene-hadoop Wiki] Trivial Update of "Hbase/HbaseArchitecture" by stack

Reply via email to