[Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman

Apache Wiki Wed, 30 May 2007 09:19:29 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

------------------------------------------------------------------------------
  them. Thanks!
  
  '''NEWS:'''
-  1. An update to the original HBase code has been committed to the Hadoop 
source tree, from a patch attached to 
[http://issues.apache.org/jira/browse/HADOOP-1282 Hadoop Jira Issue 1282]. You 
can find the current HBase code in the 
[http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/ Hadoop SVN 
tree]
+  1. HBase is being updated frequently. The latest code can always be found in 
the [http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/ trunk 
of the Hadoop svn tree]. 
-  1. HBase now has its own component in the Hadoop Jira. Bug reports, 
contributions, etc. should be tagged with the component '''contrib/hbase'''.
+  1. HBase now has its own component in the 
[https://issues.apache.org/jira/browse/HADOOP Hadoop Jira]. Bug reports, 
contributions, etc. should be tagged with the component '''contrib/hbase'''.
  
  = Table of Contents =
  
@@ -495, +495 @@

  
  by [wiki:udanax Udanax] [[MailTo(webmaster AT SPAMFREE udanax DOT org)]]
  
- I think Hbase should be compact (space-efficient), fast and should be able to 
manage high-demand load. It should be able to handle sparse tables efficiently.
+ I think Hbase should be compact (space-efficient), fast and should be able to 
manage high-demand load. It should be able to handle sparse tables efficiently. 
So, for wide and sparse data, Hbase must store data by columns like C-Store 
does.
- So, for wide and sparse data, Hbase must store data by columns like C-Store 
does.
  
-  ''I agree. But let's not get ahead of ourselves here. I only posted the 
conceptual view last night. There is no part of the document that discusses how 
the data is physically organized. I was going to work on that today. 
Patience.'' -- JimKellerman
+  ''I agree. See the sections on the [#conceptual conceptual data model] and 
the [#physical physical data model]. -- JimKellerman 2007/05/30''
  
  A column-oriented system handles NULLs more easily with significantly smaller 
performance overhead,
  and supports both Horizontal and Vertical Parallel Processing.
  
-  ''Bigtable (and Hbase) do not even have to store nulls. If there is no value 
for a particular key, then an empty or null value will be returned'' -- 
JimKellerman
+  ''Bigtable (and Hbase) do not store nulls. If there is no value for a 
particular key, then an empty or null value will be returned -- JimKellerman 
2007/05/30''
  
  Let's consider the following case:
  You may be familiar to RDF(Resource Description Framework) Storage from W3C, 
which is
@@ -513, +512 @@

   * Columns are in the form of (family: optional qualifier). This is a RDF 
Properties 
   * Columns have type information  
  
-   ''In both Bigtable, and Hbase, there is no notion of type. Keys and values 
in Bigtable are arbitrary strings. For Hbase, we are considering that values be 
an arbitrary byte array.''
+   ''In both Bigtable, and Hbase, there is no notion of type. Keys and values 
in Bigtable are arbitrary strings. In Hbase, values are an arbitrary byte 
array. -- JimKellerman 2007/05/30''
- 
-   ''Why? Bigtable is written in C++ and std::string can contain an arbitrary 
byte sequence. Hbase will be written in Java and in Java Strings have an 
encoding associated with them. Unless you store the original encoding of a 
value, you have no way to decode it back into the same encoding.'' -- 
JimKellerman
  
   * Because of the design of the system, columns are easy to create (and are 
created implicitly) 
  
-   ''In Bigtable, columns are easy to create but they require administration 
priviliges (Access Control Lists control who can manipulate the schema. Hbase 
will follow this metaphor.'' -- JimKellerman
+   ''In Bigtable, column families are easy to create but they require 
administration priviliges (Access Control Lists control who can manipulate the 
schema. New column family members can be created at any time. Hbase follows 
this metaphor. -- JimKellerman 2007/05/30''
  
   * Column families can be split into locality groups (Ontologies!)

[Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman

Reply via email to