Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/ShellPlans

------------------------------------------------------------------------------
  ----
  
  = Hbase Shell Plan Draft =
- Plan is to significantly expand the set of shell operators.  Basic data 
manipulation and data definition operators will be extended and evolved to be 
more SQL-like ([:Hbase/HbaseShell/HQL HQL]).  More sophisticated manipulations 
to do relational and linear algebra, matrix additions, multiplications, etc., 
will be added to a HBase subshell to keep the two operator types -- SQL-like 
vs. non-SQL -- distinct.
+ Plan is to significantly expand the set of shell operators.  Basic data 
manipulation and data definition operators will be extended and evolved to be 
more SQL-like ([:Hbase/HbaseShell/HQL]).  More sophisticated manipulations to 
do relational and linear algebra, matrix additions, multiplications, etc., will 
be added to a HBase subshell to keep the two operator types -- SQL-like vs. 
non-SQL -- distinct.
  
  This project is currently in the planning stage.  
[https://issues.apache.org/jira/browse/HADOOP-1608 HADOOP-1608] to add 
"Relational Algrebra Operators" is currently in process.
  
  == People Involved ==
   * '''Syntax definition.'''
    * [:udanax:Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
-   * Inchul Song, Ph.D. Candidate[[BR]]Database Lab[[BR]]Division of Computer 
Science, KAIST
+   * Inchul Song, Ph.D. Candidate[[BR]]Database Lab (Division of Computer 
Science, KAIST)
  
  If you have constructive ideas, please advise me. [EMAIL PROTECTED]
  
- ''~-This page looks great. I've added comments to the below.  Please remove 
after you are done with them. -- St.Ack-~''
+ == Suggested Hbase Query Language plans ==
  
- == Suggested Hbase Shell plans ==
- === Hbase Query Language ===
  I've made some changes to your initial HQL to make it look more like SQL. I 
borrowed the syntax definition style from MySQL. 
+ 
   -- [:Hbase/HbaseShell/HQL] by Inchul Song
  
- ''~-if you're ready to implement them, I suggest you to open a new issue for 
"HQL" -- Edward-~''
+ ~-''If you're ready to implement them, I suggest you to open a new issue for 
"HQL" -- Edward''-~
  
  ----
  
@@ -43, +42 @@

  Hbase.altools > exit;
  Hbase > exit;
  }}}
+ 
- Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to 
provide scalable data processing capabilities like  aggregation, algebraic 
calculation(groups and sets, commutative rings, algebraic geometry, and linear 
algebra) on Hadoop + Hbase based parallel machines. especially, it will focus 
on storing and manipulating sparse matrices on Hbase.
+ Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to 
provide scalable data processing capabilities like  aggregation, algebraic 
calculation(groups and sets, commutative rings, algebraic geometry, and linear 
algebra) on Hadoop + Hbase based parallel machines. especially, it will focus 
on storing and manipulating '''sparse matrices''' on Hbase.
  
   ''-- Altools Matrix operations will show how Google search's LSI, Google 
Earth's algebraic topology, Google News' recommendation system are related to 
Bigtable. See the HBase Shell Usage Page. --[:Hbase/HbaseShell/Examples]''
+ 
  
  === Hbase altools Goals ===
   * A Simplified Import/Export/Migrate Functionality Between different data 
sources (Hadoop, HBase)
@@ -59, +60 @@

  I expect Hadoop + Hbase to handle sparsity and data explosion very well in 
near future. Moreover, i believe the design of the multi-dimensional map 
structure and the 3d space model of the data are optimized for rapid ad-hoc 
information retrieval in any orientation, as well as for fast, flexible 
calculation and transformation of raw data based on formulaic relationships. It 
is advantageous with respect to '''Analysis Processing'''  as it allows users 
to easily formulate complex queries, and filter or slice data into meaningful 
subsets, among other things.
  
  === Rationale ===
+ 
  It will probably take a while for Hadoop + HBase to provide reliable 
real-time service like other DBMS.  [[BR]]Also, Multi Dimensional Model is 
commonly accepted for OLAP.
  ||<bgcolor="#E5E5E5">'''System Characteristic''' 
||<bgcolor="#E5E5E5">'''RDBMS''' ||<bgcolor="#E5E5E5">'''Multi-Dimensional 
Model Hbase''' ||
  ||Data Retrieval Perfomance ||Slow ||Fast ||
@@ -73, +75 @@

  I don't expect it to give us a high-performance just yet, but it will sure 
make data management and development much easier. First, let's take a look at 
HBase's data model. HBase provides a unified data model and it represents a 
data in 3-dimensional - Row, Column, and TImestamp. Also, Row and Column may be 
extended infinitely.
  
  If we decide to cut the data model in time version, then we may view the new 
data as a 2D table. If index is in string, we may view it as a huge map. If 
index is in integer, then it is one huge 2D array. So each table may have such 
data storages in 3D (Columnfamilies) Locality Group(Columnfamilies) is a 
relationship that can occur between multiple references whenever one reference 
brings in much of the data used by the other references.
- 
- ''~-I think people may also start to ask as your operators evolve: 'What is 
the difference between HBase Shell and Yahoo! PIG?' -- St.Ack-~''
  
  ----
  
@@ -95, +95 @@

  ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
  ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new 
relation as the set that is obtained when all tuples(rows) in ~+R+~ are 
restricted to the set 
{columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A = 
Table('movieLog_table');[[BR]]B = A.Projection('year','length');''-~ ||
  ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new 
relation as the set of specified tuples(rows) of the relation ~+R+~[[BR]]'''Set 
Operations''' : ~-''OR, AND, NOT''-~[[BR]][[BR]]~-''A = 
Table('movieLog_table');[[BR]]B = A.Selection(length > 100 AND studioName = 
'Fox');''-~ ||
+ ||JOINs ||<99%>Table '''JOIN''' operations, linking and extracting data from 
two different internal source[[BR]]'''Operations''' : ~-''naturalJoin(), 
thetaJoin(), cartesianProduct() ''-~ [[BR]][[BR]]~-''R = 
Table('movieLog_table');[[BR]]S = Table('movieStar_table');[[BR]]C = 
R.naturalJoin(S); //C = R▷◁S''-~ ||
  ||Group ||<99%>'''Group''' tuples by value of an attribute and apply 
aggregate function independently to each group of tuples.[[BR]]'''Aggregate 
Functions''' : ~-''AVG( attribute ), SUM( attribute ), COUNT( attribute ), MIN( 
attribute ), MAX( attribute )''-~[[BR]][[BR]]~-''A = 
Table('movieLog_table);[[BR]]B = A.Group('studioName', MIN('year'));''-~ ||
  ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to 
columnfamilies on columnfamily-list[[BR]][[BR]]~-''A = 
Table('movieLog_table');[[BR]]B = Sort A by ('length');''-~ ||
  
@@ -142, +143 @@

  St.Ack
  }}}
  
+ 
+ 
  ----
  = Example Of Hbase Shell Use =
  

Reply via email to