Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

------------------------------------------------------------------------------
  And then, vertical(column) data set by one of RDF properties can be read fast 
from Table, because it is column-stored.
  Please let me know if you don't agree with me.
  
- ----
- === My think. ===
- 
- by [wiki:udanax Udanax] [[MailTo(webmaster AT SPAMFREE udanax DOT org)]]
- 
- First I would like to pay my respect to your committment to the Hbase Project 
and this is my opinion.
- [[BR]]Based on the Paper, the picture belows expresses the concept of 
BigTable when 'T' is the table and Column Families 'A' and its Attribute-values 
are like the followings.
- 
- 
[http://mirror.udanax.org/~udanax/rsync1/blog_udanax_org/udanax/280/o_full.jpg]
- 
- BigTable is the storage layer of the sparse matrix data.
- [[BR]]And the goal is not Data Selection even though it is very useful 
feature, but Matrix Computation and Aggregation.
- 
- Refering to the example code of the Google's Paper, it would be like this
- {{{
- Scanner scanner(T);
- ScanStream *stream;
- stream = scanner.FetchColumnFamily("A");
- stream->SetReturnVersions("t2");
- scanner.Lookup("2");
-  
- for (; !stream->Done(); stream->Next()) {
-         printf("%s %s %lld %s\n",
-                 scanner.RowName(),
-                 stream->ColumnName(),
-                 stream->Value());
- }
- }}}
- 
- This example code prints first and second row vectors of the the 4*4 Sparse 
Matrix.  
- [[BR]]It process vector calculation in parallel with row-wise partition.
- [[BR]]Therefore, in order to do distiributed computing effectively, the data 
structure needs to be defined to fully support the preprocessing to get 
abstract Matrix Information
- 
- Then, I think architecture need to be like this
- 
-  * Data Storage Conceptual 
-  * Data Distribution 
-  * Segment Format 
-  * Data Management Tools 
-  * Parallel Matrix Computation, Parallel Aggregation Engine 
-  * Parallel Analysis Interface 
-  * Example 
-  * Benefits, Benchmark Report / Discussion 
- 
- and theses are the major component list I think architecture need to have
- [[BR]]So, I would like to discuss the arhcitecture of Hbase with you in 
detail.
- 

Reply via email to