Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by udanax: http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture ------------------------------------------------------------------------------ And then, vertical(column) data set by one of RDF properties can be read fast from Table, because it is column-stored. Please let me know if you don't agree with me. - ---- - === My think. === - - by [wiki:udanax Udanax] [[MailTo(webmaster AT SPAMFREE udanax DOT org)]] - - First I would like to pay my respect to your committment to the Hbase Project and this is my opinion. - [[BR]]Based on the Paper, the picture belows expresses the concept of BigTable when 'T' is the table and Column Families 'A' and its Attribute-values are like the followings. - - [http://mirror.udanax.org/~udanax/rsync1/blog_udanax_org/udanax/280/o_full.jpg] - - BigTable is the storage layer of the sparse matrix data. - [[BR]]And the goal is not Data Selection even though it is very useful feature, but Matrix Computation and Aggregation. - - Refering to the example code of the Google's Paper, it would be like this - {{{ - Scanner scanner(T); - ScanStream *stream; - stream = scanner.FetchColumnFamily("A"); - stream->SetReturnVersions("t2"); - scanner.Lookup("2"); - - for (; !stream->Done(); stream->Next()) { - printf("%s %s %lld %s\n", - scanner.RowName(), - stream->ColumnName(), - stream->Value()); - } - }}} - - This example code prints first and second row vectors of the the 4*4 Sparse Matrix. - [[BR]]It process vector calculation in parallel with row-wise partition. - [[BR]]Therefore, in order to do distiributed computing effectively, the data structure needs to be defined to fully support the preprocessing to get abstract Matrix Information - - Then, I think architecture need to be like this - - * Data Storage Conceptual - * Data Distribution - * Segment Format - * Data Management Tools - * Parallel Matrix Computation, Parallel Aggregation Engine - * Parallel Analysis Interface - * Example - * Benefits, Benchmark Report / Discussion - - and theses are the major component list I think architecture need to have - [[BR]]So, I would like to discuss the arhcitecture of Hbase with you in detail. -