[Lucene-hadoop Wiki] Update of "Hbase/ShellPlans" by stack

Apache Wiki Thu, 12 Jul 2007 12:13:13 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by stack:
http://wiki.apache.org/lucene-hadoop/Hbase/ShellPlans

The comment on the change is:
Page two of hbase shell split (after chatting with Edward Yoon)

New page:
* Work in progress

[[TableOfContents(4)]]

---
= Introduction =
A basic version of an [wiki:Hbase/HbaseShell HBase Shell] was added to HBase in 
July, 2007.  This page discusses future HBase Shell features and directions.

= Hbase Shell Goals =
 * A Simplified Import/Export/Migrate Functionality Between different data 
sources (Hadoop, HBase)
 * A Simplified processing of a logical data model
 * A Simplified algebraic operations
 * A Simplified Parallel Numerical Analysis by abstracting/numericalizing 
points, lines, [[BR]]or plane data across multiple maps in HBase.

== HBase Shell Background ==

I expect Hadoop + Hbase to handle sparsity and data explosion very well in near 
future. Moreover, i believe the design of the multi-dimensional structure and 
the 3-dim space model of the data are optimized for rapid ad-hoc information 
retrieval in any orientation, as well as for fast, flexible calculation and 
transformation of raw data based on formulaic relationships.

Then, I thought it would require a more user-friendly interface to enable 
querying the data interactive.

=== Rationale ===

It will probably take a while for Hadoop + HBase to provide reliable real-time 
service like other DBMS. Thus, I decided to develop a shell to process linear 
algebraic computing and large scale data using Hadoop's parallel processing and 
HBase storage.

''Then you may ask "What is a difference from MapReduce using MapFiles?"''

I don't expect it to give us a high-performance just yet,
but it will sure make data management and development much easier.
First, let's take a look at HBase's data model.

HBase provides a unified data model and it represents a data in 3-dimensional
- Row, Column, and TImestamp. Also, Row and Column may be extended infinitely.

If we decide to cut the data model in time version, then we may view the new 
data as a 2D table.
If index is in string, we may view it as a huge map. If index is in integer, 
then it is one huge 2D array.

So each table may have such data storages in 3D (ColumnFamilies)
Locality Group(Columnfamilies) is a relationship that can occur between 
multiple references
whenever one reference brings in much of the data used by the other references.

  ''-- I hope physical files on networks are grouped together with locality 
grouping.[[BR]]by [:udanax:udanax].''

== People Involved ==

 * [:udanax:Edward Yoon] [[MailTo(udanax AT SPAMFREE nhncorp DOT com)]] (NHN 
corp.)
 * [:boyo:Sewon Kim] [[MailTo(ebow31 AT SPAMFREE gmail.com)]] (Empas, Inc.)
 * [:mskim:Minsu Kim] [[MailTo(minsu.kim AT SPAMFREE gmail.com)]] (Daum, Inc.)

----
= Suggested Future Hbase Shell Operators =
'''Note''' that Data should be located by their row, column, and timestamp.

== Commands ==
||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' ||
||Substitute || '''Substitute''' expression to [A~Z][[BR]][[BR]]~-''X = 
Matrix(table_name, columnfamily_name);''-~||
||Store ||'''STORE''' command will store results to specified table. 
[[BR]][[BR]]~-''A = Table('movieLog_table'); [[BR]]B = A.Selection('length' > 
100); [[BR]]STORE B TO X run_style;''-~ ||
||Set ||'''SET''' command will change the values. [[BR]][[BR]]~-''SET 
table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE 
row='row_key' AND time='Specified_Timestamp';''-~ ||
== Relational Operators ==

||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' ||
||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new 
relation as the set that is obtained when all tuples(rows) in ~+R+~ are 
restricted to the set 
{columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A = 
Table('movieLog_table');[[BR]]B = A.Projection('year','length');''-~||
||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation 
as the set of specified tuples(rows) of the relation ~+R+~[[BR]]'''Set 
Operations''' : ~-''OR, AND, NOT''-~[[BR]][[BR]]~-''A = 
Table('movieLog_table');[[BR]]B = A.Selection('length' > 100);[[BR]]C = 
A.Selection('length' > 100 AND 'year' > 1979);''-~||
||Product ||<99%>'''Product''' of relations R and S, It makes a new relation as 
the set of all possible combinations of tuples of the two operation 
relations.[[BR]]'''NOTE''' that this is the most computationally expensive 
operator in the relational algebra.||
||Rename ||<99%>'''Rename''' r to x, The columnfamily names in the 
columnfamily-list replace the columnfamily names of the 
relation.[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = 
A.Rename('length' = 'movieLength');''-~||
||Group ||<99%>'''Group''' tuples by value of an attribute and apply aggregate 
function independently to each group of tuples.[[BR]]'''Aggregate Functions''' 
: ~-''AVG( attribute ), SUM( attribute ), COUNT( attribute ), MIN( attribute ), 
MAX( attribute )''-~[[BR]][[BR]]~-''A = Table('movieLog_table);[[BR]]B = 
A.Group('studioName', MIN('year'));''-~||
||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to 
columnfamilies on columnfamily-list[[BR]][[BR]]~-''A = 
Table('movieLog_table');[[BR]]B = A.Sort('length', 'vote');''-~||

== Matrix Operators ==


* matrix operator

||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' ||
||Addition ||<99%>... ||
||subtraction ||<99%>... ||
||multiplication ||<99%>... ||
||division ||<99%>... ||
||transpose ||<99%>interchanging rows and columns ||
||permutation ||<99%>... ||
||norms ||<99%>... ||

* decompositions

||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' ||
||LU ||<99%>... ||
||QR ||<99%>... ||
||Cholesky ||<99%>... ||
||SVD ||<99%>... ||
||Inverse ||<99%>interchanging rows and columns ||
||Pseudoinverse ||<99%>... ||
||Condition ||<99%>... ||
||Determinant ||<99%>... ||
||Rank ||<99%>... ||

[Lucene-hadoop Wiki] Update of "Hbase/ShellPlans" by stack

Reply via email to