[Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman

Apache Wiki Mon, 02 Apr 2007 16:10:08 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

The comment on the change is:
update current status

------------------------------------------------------------------------------
  complete, have been tested, and seem to work great.
  
  The multi-machine stuff (the H!BaseMaster, the H!RegionServer, and the
+ HClient) are now complete but have not been tested.  However, the code
+ is now very clean and in a state where other people can understand it
+ and contribute.
- HClient) have not been fully tested. The reason is that the HClient is
- still incomplete, so the rest of the distributed code cannot be
- fully-tested. I think it's good, but can't be sure until the HClient
- is done. However, the code is now very clean and in a state where
- other people can understand it and contribute.
  
  Other related features and TODOs:
- 
+  1. Scanners can now be started at a specific row and do not have to scan a 
whole table.
+  1. The client-server code is now complete but needs to be debugged and tests 
need to be written for it.
+  1. There is A Junit test for the base classes that covers most of 
non-distributed functionality: writing, reading, flushing, log-rolling, and 
scanning. If the environment variable DEBUGGING=TRUE is set when running the 
test, it runs a more extensive test that includes writing and reading 10^6^ 
rows, compaction, splitting and merging. The extensive test is not enabled by 
default as it takes over 10 minutes to run.
+  1. Utility classes are needed to start and stop a HBase cluster.
   1. Single-machine log reconstruction works great, but distributed log 
recovery is not yet implemented. This is relatively easy, involving just a sort 
of the log entries, placing the shards into the right DFS directories
   1. Data compression is not yet implemented, but there is an obvious place to 
do so in the HStore.
   1. We need easy interfaces to !MapReduce jobs, so they can scan tables
@@ -385, +386 @@

   1. File compaction is relatively slow; we should have a more conservative 
algorithm for deciding when to apply compaction.
   1. For the getFull() operation, use of Bloom filters would speed things up
   1. We need stress-test and performance-number tools for the whole system
-  1. There's some HRegion-specific testing code that is a Junit test for 
HRegion. A new version of this test has to be written so it works against an 
HRegion while it's hosted by an H!RegionServer, and connected to an 
H!BaseMaster.
   1. Implement some kind of block caching in HRegion. While the DFS isn't 
hitting the disk to fetch blocks, HRegion is making IPC calls to DFS (via 
!MapFile)
   1. Investigate possible performance problem or memory management issue 
related to random reads. As more and more random reads are done, performance 
slows down and the memory footprint increases.

[Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman

Reply via email to