[Lucene-hadoop Wiki] Trivial Update of "Hbase/ShellPlans" by udanax

Apache Wiki Sun, 05 Aug 2007 23:39:42 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/ShellPlans

------------------------------------------------------------------------------
- [[TableOfContents(4)]]
+ [[TableOfContents(5)]]
  
  ----
  
- = Hbase Shell Plans =
+ = Hbase Shell Plan Draft =
- 
  == People Involved ==
- 
   * '''Syntax definition.'''
-   * [wiki:udanax Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
+   * [:udanax:Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
    * Inchul Song, Ph.D. Candidate[[BR]]Database Lab[[BR]]Division of Computer 
Science, KAIST
- 
   * '''Code Implementation.'''
-   * [wiki:udanax Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
+   * [:udanax:Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
    * Minsu Kim, System Engineer at Daum corp.
    * Sewon Kim, System Engineer at Empas corp.
  
-  * '''Jira Issues.'''
-   * https://issues.apache.org/jira/browse/HADOOP-1608
-    * https://issues.apache.org/jira/browse/HADOOP-1658
-   * https://issues.apache.org/jira/browse/HADOOP-1655
- 
  If you have constructive ideas, please advise me. [EMAIL PROTECTED]
  
- == Suggested Hbase Shell Syntax ==
+ == Suggested Hbase Shell plans ==
  
-   -- Inchul, Feel free to add your opinion.
+  ''--Inchul, Feel free to add your opinion.[[BR]]udanax''
  
+  * [:HbaseShell/HQL] - I've made some changes to your initial HQL to make it 
look more like SQL. I borrowed the syntax definition style from MySQL.
- HBase Query Language (HQL) discussions and syntax draft page.
- 
-  * http://www.hadoop.co.kr/wiki/moin.cgi/HBaseShell/HQL
  
  ----
  
- = Hbase Shell altools plans =
+ == Suggested Hbase Shell altools plans ==
- 
  I suggest to develop HBase Shell in SQL-style, and develop '''al'''gebraic 
'''tools''' as a sub shell as described below. 
  
  {{{
@@ -51, +40 @@

  Hbase.altools > exit;
  Hbase > eixt;
  }}}
+ Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to 
provide scalable data processing capabilities like  aggregation, algebraic 
calculation(groups and sets, commutative rings, algebraic geometry, and linear 
algebra) on Hadoop + Hbase based parallel machines. 
  
- Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to 
provide scalable data processing capabilities like aggregation, algebraic 
calculation(groups and sets, commutative rings, algebraic geometry, and linear 
algebra) on Hadoop + Hbase based parallel machines.
+  ''--Altools Matrix operations will show how Google search's LSI, Google 
Earth's algebraic topology, Google News' recommendation system are related to 
Bigtable. See the HBase Shell Usage Page.[:HBaseShell/Examples]''
  
- ''Altools Matrix operations will show how Google search's LSI, Google Earth's 
algebraic topology, Google News' recommendation system are related to 
Bigtable.''
- 
- = Hbase altools Goals =
+ === Hbase altools Goals ===
   * A Simplified Import/Export/Migrate Functionality Between different data 
sources (Hadoop, HBase)
   * A Simplified processing of a logical data model
   * A Simplified algebraic operations
   * A Simplified Parallel Numerical Analysis by abstracting/numericalizing 
points, lines, or plane data across multiple maps in HBase.
- 
- == HBase altools Background ==
+ === HBase altools Background ===
- 
- I expect Hadoop + Hbase to handle sparsity and data explosion very well in 
near future. Moreover, i believe the design of the multi-dimensional map 
structure and the 3d space model of the data are optimized for rapid ad-hoc 
information retrieval in any orientation, as well as for fast, flexible 
calculation and transformation of raw data based on formulaic relationships. It 
is advantageous with respect to '''Analysis Processing''' 
+ I expect Hadoop + Hbase to handle sparsity and data explosion very well in 
near future. Moreover, i believe the design of the multi-dimensional map 
structure and the 3d space model of the data are optimized for rapid ad-hoc 
information retrieval in any orientation, as well as for fast, flexible 
calculation and transformation of raw data based on formulaic relationships. It 
is advantageous with respect to '''Analysis Processing'''  as it allows users 
to easily formulate complex queries, and filter or slice data into meaningful 
subsets, among other things.
- as it allows users to easily formulate complex queries, and filter or slice 
data into meaningful subsets, among other things.
  
  === Rationale ===
- 
- It will probably take a while for Hadoop + HBase to provide reliable 
real-time service like other DBMS. 
+ It will probably take a while for Hadoop + HBase to provide reliable 
real-time service like other DBMS.  [[BR]]Also, Multi Dimensional Model is 
commonly accepted for OLAP.
- [[BR]]Also, Multi Dimensional Model is commonly accepted for OLAP.
- 
- ||<bgcolor="#ececec">'''System Characteristic''' 
||<bgcolor="#ececec">'''RDBMS''' ||<bgcolor="#ececec">'''Multi-Dimensional 
Model Hbase''' ||
+ ||<bgcolor="#E5E5E5">'''System Characteristic''' 
||<bgcolor="#E5E5E5">'''RDBMS''' ||<bgcolor="#E5E5E5">'''Multi-Dimensional 
Model Hbase''' ||
  ||Data Retrieval Perfomance ||Slow ||Fast ||
  ||Calculation Functionality || Limited, in all but one dimension ||Can be 
very high, all dimensions ||
  ||Openness to live data access by other applications ||Excellent ||Limited ||
  ||Priorities ||High perfomance, High availability ||High flexibility, High 
user autonomy ||
  
+ 
  Thus, I decided to develop a shell to process linear algebraic computing and 
large scale data using Hadoop's parallel processing and HBase storage.
  
  ''Then you may ask "What is a difference from MapReduce using MapFiles?"''
  
+ I don't expect it to give us a high-performance just yet, but it will sure 
make data management and development much easier. First, let's take a look at 
HBase's data model.
- I don't expect it to give us a high-performance just yet,
- but it will sure make data management and development much easier.
- First, let's take a look at HBase's data model.
  
+ HBase provides a unified data model and it represents a data in 3-dimensional 
- Row, Column, and TImestamp. Also, Row and Column may be extended infinitely.
- HBase provides a unified data model and it represents a data in 3-dimensional
- - Row, Column, and TImestamp. Also, Row and Column may be extended infinitely.
  
+ If we decide to cut the data model in time version, then we may view the new 
data as a 2D table. If index is in string, we may view it as a huge map. If 
index is in integer, then it is one huge 2D array.
- If we decide to cut the data model in time version, then we may view the new 
data as a 2D table.
- If index is in string, we may view it as a huge map. If index is in integer, 
then it is one huge 2D array.
  
+ So each table may have such data storages in 3D (Columnfamilies) Locality 
Group(Columnfamilies) is a relationship that can occur between multiple 
references whenever one reference brings in much of the data used by the other 
references.
- So each table may have such data storages in 3D (Columnfamilies)
- Locality Group(Columnfamilies) is a relationship that can occur between 
multiple references
- whenever one reference brings in much of the data used by the other 
references.
  
  ----
  
- = Suggested Hbase altools Operators =
+ === Suggested Hbase altools Operators ===
  '''Note''' that Data should be located by their row, column, and timestamp.
  
- == Commands ==
+ ==== Commands ====
- ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' ||
+ ||<bgcolor="#E5E5E5">'''Command''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
  ||Table ||'''Table''' command load from specified table. [[BR]][[BR]]~-''A = 
Table('movieLog_table');''-~ ||
  ||Matrix ||'''Matrix''' command control the configuration of the logic 
matrix. [[BR]][[BR]]~-''M = Matrix(table_name, columnfamily_name[, scalar 
S]);''-~ ||
  ||Substitute || '''Substitute''' expression to [A~Z][[BR]][[BR]]~-''A = 
Table('movieLog_table');''-~ ||
  ||Store ||'''Store''' command will store results to specified table. 
[[BR]][[BR]]~-''A = Table('movieLog_table'); [[BR]]B = A.Selection(length > 
100); [[BR]]Store B TO table('tmp_table')[or file('backup.dat')];''-~ ||
- == Relational Operators ==
+ ==== Relational Operators ====
- ||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' ||
+ ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
  ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new 
relation as the set that is obtained when all tuples(rows) in ~+R+~ are 
restricted to the set 
{columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A = 
Table('movieLog_table');[[BR]]B = A.Projection('year','length');''-~ ||
  ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new 
relation as the set of specified tuples(rows) of the relation ~+R+~[[BR]]'''Set 
Operations''' : ~-''OR, AND, NOT''-~[[BR]][[BR]]~-''A = 
Table('movieLog_table');[[BR]]B = A.Selection(length > 100 AND studioName = 
'Fox');''-~ ||
  ||Group ||<99%>'''Group''' tuples by value of an attribute and apply 
aggregate function independently to each group of tuples.[[BR]]'''Aggregate 
Functions''' : ~-''AVG( attribute ), SUM( attribute ), COUNT( attribute ), MIN( 
attribute ), MAX( attribute )''-~[[BR]][[BR]]~-''A = 
Table('movieLog_table);[[BR]]B = A.Group('studioName', MIN('year'));''-~ ||
  ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to 
columnfamilies on columnfamily-list[[BR]][[BR]]~-''A = 
Table('movieLog_table');[[BR]]B = Sort by ('length');''-~ ||
  
+ ==== Matrix Arithmetic Operators ====
+ ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
+ ||Addition ||<99%>'''Adding''' entries with the same indices 
[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = 
Matrix('m_table','cf_2');[[BR]]C = A + B;''-~ ||
+ ||Subtraction ||<99%>'''Subtracting''' entries with the same indices 
[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = 
Matrix('m_table','cf_2');[[BR]]C = A + B;''-~ ||
+ ||Multiplication ||<99%>'''Multiplication''' of two matrices, Product C of 
two matrices A and B [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = 
Matrix('m_table','cf_2');[[BR]]C = A * B;''-~ ||
+ ||Division ||<99%>'''Division''' is solving the matrix equation AX = B for X 
[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = 
Matrix('m_table','cf_2');[[BR]]C = A /[or \] B;''-~||
+ ||Transpose ||<99%>'''Transpose''' of a Matrix, A matrix which is formed by 
turning all the rows of a given matrix into columns and 
vice-versa.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = 
Transpose(A);''-~||
+ 
+ ==== Factorizations and Decompositions ====
+ ||<bgcolor="#E5E5E5">'''Function''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
+ ||LU ||<99%>'''LU Decomposition'''[[BR]]A procedure for decomposing an N by N 
matrix A into a product of a lower triangular matrix L and an upper triangular 
matrix U, LU = A[[BR]]'''Functions''' : ~-''getL(), getU(), isSingular(), 
getPivot()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = 
LUDecomposition(A);[[BR]]C = getU(B);[[BR]]D = getL(A);''-~||
+ ||QR ||<99%>'''QR Decomposition'''[[BR]]For an m-by-n matrix A with m >= n, 
the QR decomposition is an m-by-n orthogonal matrix Q and an n-by-n upper 
triangular matrix R so that A = Q*R.[[BR]]'''Functions''' : ~-''getH(), getQ(), 
getR()''-~[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = 
QRDecomposition(A);[[BR]]C = getH(B);''-~||
+ ||Cholesky ||<99%>'''Cholesky Decomposition'''[[BR]]It is a special case of 
LU decomposition applicable only if matrix to be decomposed is symmetric 
positive definite.[[BR]]'''Functions''' : ~-''getL(), isSPD()''-~ 
[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = 
CholeskyDecomposition(A);[[BR]]C = getU(B);[[BR]]D = getL(A);''-~||
+ ||SVD ||<99%>'''SV(Singular Value) Decomposition'''[[BR]]For an m-by-n matrix 
A with m >= n, the singular value decomposition is an m-by-n orthogonal matrix 
U, an n-by-n diagonal matrix S, and an n-by-n orthogonal matrix V so that A = 
U*S*V'.[[BR]]'''Functions''' : ~-''getS(), getU(), getV(), 
getSingularValues()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = 
SVDecomposition(A);[[BR]]C = getU(B);''-~||
  ----
+ = Implementation =
- == Matrix Operators ==
- ||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' ||
- ||Addition ||<99%>Adding entries with the same indices [[BR]][[BR]]~-''C = A 
+ B;''-~ ||
- ||subtraction ||<99%>Subtracting entries with the same indices 
[[BR]][[BR]]~-''C = A + B;''-~ ||
- ||multiplication ||<99%>Product C of two matrices A and B [[BR]][[BR]]~-''C = 
A * B;''-~ ||
- ||division ||<99%>... ||
- ||transpose ||<99%>... ||
- ||permutation ||<99%>... ||
- ||norms ||<99%>... ||
- === Factorizations and decompositions ===
- ||<bgcolor="#ececec">'''Function''' ||<bgcolor="#ececec">'''Explanation''' ||
- ||LU ||<99%>... ||
- ||QR ||<99%>... ||
- ||Cholesky ||<99%>... ||
- ||SVD ||<99%>... ||
- ||Inverse ||<99%>... ||
- ||Pseudoinverse ||<99%>... ||
- ||Condition ||<99%>... ||
- ||Determinant ||<99%>... ||
- ||Rank ||<99%>... ||
- === Column-Wise Data Analysis ===
- ||<bgcolor="#ececec">'''Function''' ||<bgcolor="#ececec">'''Explanation''' ||
- ||Frequencies ||<99%>... ||
- ||Sorting ||<99%>... ||
- ||Covariance ||<99%>... ||
  
+ '''Note'''
+ {{{
+ Run the following: % ant clean jar compile-contrib test javadoc 
  
- = Examples =
+ This will run all tests and will show you javadoc warnings if any(Javadoc 
warnings will cause hudson to fail). 
+ If you just want to run the hbase tests only because the full suitetakes too 
long, do following: 
  
+ % cd src/contrib/hbase
+ % ant jar test 
+ OR 
+ % ant clean jar test 
- == Relational Operations Examples ==
- ||Row Key ||||||||||||||||||||||||Column Families ||
- ||<rowbgcolor="#ececec">title |||| year ||||length ||||inColor |||| 
studioName |||| vote |||| producer ||
- ||Star Wars ||year: || 1977 ||length: || 124 ||inColor: || true ||studioName: 
|| Fox || vote:''user_1'' || 5 || producer: || George Lucas ||
- || || || || || || || || || || vote:''user_2'' || 2 || || ||
- ||Mighty Ducks ||year: || 1991 ||length: || 104 ||inColor: || true 
||studioName: || Disney || vote:''user_1'' || 2 || producer: || Blair Peters ||
- || || || || || || || || || || vote:''user_3'' || 4 || || ||
- ||Wayne's World ||year: || 1992 ||length: || 95 ||inColor: || true 
||studioName: || Paramount || vote:''user_2'' || 3 || producer: || Penelope 
Spheeris ||
- || || || || || || || || || || vote:''user_3'' || 4 || || ||
- '''~+^Ï^+~'''~-title-~,~-year-~,~-length-~'''~+^(movieLog_table)^+~'''
  
+ St.Ack
+ }}}
- A = table('movieLog_table'); [[BR]]B = A.projection('year','length');
- ||<rowbgcolor="#ececec">title ||year ||length ||
- ||Star Wars ||1977 ||124 ||
- ||Mighty Ducks ||1991 ||104 ||
- ||Wayne's World ||1992 ||95 ||
  
+ ----
+ = Example Of Hbase Shell Use =
  
- '''~+^Ï^+~'''~-length>100-~'''~+^(movieLog_table)^+~'''
+ See [:HbaseShell/Examples]
  
- A = Table('movieLog_table'); [[BR]]B = A.Selection(length > 100);
- ||<rowbgcolor="#ececec">title ||year ||length ||inColor ||studioName 
||producer ||
- ||Star Wars ||1977 ||124 ||true ||Fox ||12345 ||
- ||Mighty Ducks ||1991 ||104 ||true ||Disney ||67890 ||
- 
- 
- 
'''~+^Ï^+~'''~-title-~,~-year-~'''~+^(Ï^+~'''~-length>100-~'''~+^(movieLog_table)â©Ï^+~'''~-studioName='Fox'-~'''~+^(movieLog_table))^+~'''
- 
- A = Table('movieLog_table'); [[BR]]B = A.Projection('year'); [[BR]]C = 
B.Selection(length > 100 AND studioName = 'Fox');
- ||<rowbgcolor="#ececec">title ||year ||
- ||Star Wars ||1977 ||
-

[Lucene-hadoop Wiki] Trivial Update of "Hbase/ShellPlans" by udanax

Reply via email to