Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/HbaseShell

------------------------------------------------------------------------------
- '''work in progress'''
+ '''research/work in progress''' - 
https://issues.apache.org/jira/browse/HADOOP-1375
  
  [[TableOfContents(4)]]
+ 
  ----
  = Hbase Shell Introduction =
- 
- Hbase Shell is an 'interpreter' (or 'shell)' to provide scalable data 
processing capabilities like  
+ Hbase Shell is an 'interpreter' (or 'shell)' to provide scalable data 
processing capabilities like [[BR]]aggregation, algebraic calculation on Hadoop 
+ Hbase.
- [[BR]]aggregation, algebraic calculation on Hadoop + Hbase.
  
  == Hbase Shell Goals ==
- 
  HBase Shell is developed to achieve the following goals.
  
-  * Generic Query Model Functions
   * A Simplified Import/Export/Migrate Functionality Between different data 
sources (Hadoop, HBase)
   * A Simplified processing of a logical data model
   * A Simplified algebraic operations
-  * Parallel Numerical Analysis by abstracting/numericalizing points, lines, 
or plane data across multiple maps in HBase.
+  * A Simplified Parallel Numerical Analysis by abstracting/numericalizing 
points, lines, [[BR]]or plane data across multiple maps in HBase.
  
  == Background ==
+ 
  I expect Hadoop + Hbase to handle sparsity and data explosion very well in 
near future. [[BR]]Moreover, i believe the design of the multi-dimensional 
structure and the 3-dim space model of the data are [[BR]]optimized for rapid 
ad-hoc information retrieval in any orientation, as well as for fast, flexible 
calculation and transformation of [[BR]]raw data based on formulaic 
relationships.
  
- Then, I thought it would require a more user-friendly interface to enable 
querying the data interactive. 
+ Then, I thought it would require a more user-friendly interface to enable 
querying the data interactive.
  
  == Rationale ==
- ...
+ 
+ It will probably take a while for Hadoop + HBase to provide reliable 
real-time service like other DBMS. 
+ [[BR]]Thus, I decided to develop a shell to process linear algebraic 
computing 
+ [[BR]]and large scale data using Hadoop's parallel processing and HBase 
storage. 
+ 
+ ''Then you may ask "What is a difference from MapReduce using MapFiles?"''
+ 
+ I don't expect it to give us a high-performance just yet, 
+ [[BR]]but it will sure make data management and development much easier. 
+ [[BR]]First, let's take a look at HBase's data model. 
+ 
+ HBase provides a unified data model and it represents a data in 3-dimensional 
+ [[BR]]- Row, Column, and TImestamp. Also, Row and Column may be extended 
infinitely. 
+   
+ If we decide to cut the data model in time version, then we may view the new 
data as a 2D table. 
+ [[BR]]If index is in string, we may view it as a huge map. If index is in 
integer, then it is one huge 2D array. 
+ [[BR]]So each table may have such data storages in 3D (ColumnFamilies)
+ 
  
  ----
  = Hbase Shell Syntax Definition =
- 
  '''Note''' that Data should be located by their row, column, and timestamp.
  
  == Basic Commands ==
- 
- ||<#ececec> '''Command''' ||<#ececec> '''Explanation''' ||
+ ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' ||
  ||HELP ||<99%>'''Help''' command provides information about the use of shell 
script.[[BR]][[BR]]~-''HELP [function_name];''-~ ||
  ||SHOW ||<99%>'''Show''' command will list the tables.[[BR]][[BR]]~-''SHOW 
tables;''-~ ||
  ||DESC ||'''Desc''' command will provides information about the 
columnfamilies in a table.[[BR]][[BR]]~-''DESC table_name;''-~ ||
  ||CREATE ||'''Create''' command will create a new 
table.[[BR]][[BR]]~-''CREATE 
table_name[[BR]]COLUMNFAMILIES('columnfamily_name1'[, 'columnfamily_name2', 
...])[[BR]]LIMIT=limitNumber_of_Version;''-~ ||
  ||DROP ||'''Drop''' command will droping columnfamilies in a table or 
tables.[[BR]][[BR]]~-''DROP table_name1[, table_name2, ...] or 
columnfamily_name1[, columnfamily_name2, ...];''-~ ||
+ ||SUBSTITUTE[[BR]] || '''Substitute''' query to [A~Z][[BR]][[BR]]~-''X = 
SELECT table_name;''-~||
- ||PRINT ||'''Print''' command will print a results to the console output. 
[[BR]][[BR]]~-''A = array([1, 2, 3]);[[BR]]PRINT A;[[BR]]B = SELECT table_name 
WHERE row="row_key";[[BR]]PRINT B;''-~||
+ ||PRINT ||'''Print''' command will print a results to the console output. 
[[BR]][[BR]]~-''A = array([1, 2, 3]);[[BR]]PRINT A;[[BR]]B = SELECT table_name 
WHERE row='row_key';[[BR]]PRINT B;''-~ ||
- ||STORE ||'''STORE''' command will store results to specified table. 
[[BR]][[BR]]~-''M = matrix('table_name','columnfamily_name');[[BR]]A = 
array([[1, 2],[3, 4]]);  //In this case, Key should be an integer index. 
[[BR]]STORE A TO M run_style;[[BR]]B = SELECT table_name WHERE 
row="row_key";[[BR]]STORE B TO ('table_name','columnfamily_name1'[, 
'columnfamily_name2']) run_style;''-~||
+ ||STORE ||'''STORE''' command will store results to specified table. 
[[BR]][[BR]]~-''M = matrix('table_name','columnfamily_name');[[BR]]A = 
array([[1, 2],[3, 4]]); //In this case, Key should be an integer index. 
[[BR]]STORE A TO M run_style;[[BR]]B = SELECT table_name WHERE 
row='row_key';[[BR]]STORE B TO ('table_name','columnfamily_name1'[, 
'columnfamily_name2']) run_style;''-~ ||
  ||EXIT ||<99%>'''Exit''' from the current shell 
script.[[BR]][[BR]]~-''EXIT;''-~ ||
- 
  And, Commands to manually manipulate data on more detailed parts.
- 
- ||<#ececec> '''Command''' ||<#ececec> '''Explanation''' ||
+ ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' ||
- ||INSERT ||<99%>'''Insert''' command will insert one row into the table with 
a value for specified column in the table.[[BR]][[BR]]~-''INSERT 
table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE 
row="row_key";''-~ ||
+ ||INSERT ||<99%>'''Insert''' command will insert one row into the table with 
a value for specified column in the table.[[BR]][[BR]]~-''INSERT 
table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE 
row='row_key';''-~ ||
- ||SET ||'''SET''' command will change the values. [[BR]][[BR]]~-''SET 
table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE 
row="row_key" AND time="Specified_Timestamp";''-~||
+ ||SET ||'''SET''' command will change the values. [[BR]][[BR]]~-''SET 
table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE 
row='row_key' AND time='Specified_Timestamp';''-~ ||
- ||DELETE ||'''Delete''' command will delete specified rows in table. 
[[BR]][[BR]]~-''DELETE table_name[[BR]]WHERE row="row_key"[[BR]][AND 
column="columnfamily_name:column_key"];''-~||
+ ||DELETE ||'''Delete''' command will delete specified rows in table. 
[[BR]][[BR]]~-''DELETE table_name[[BR]]WHERE row='row_key'[[BR]][AND 
column='columnfamily_name:column_key'];''-~ ||
  
  === Relational Algebra Operators ===
- 
- ||<#ececec> '''Command''' ||<#ececec> '''Explanation''' ||
+ ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' ||
- ||SELECT ||<99%>'''Select''' command will retrieves rows from a 
table.[[BR]][[BR]]~-''SELECT table_name[[BR]][WHERE row="row_key"][[BR]][AND 
column="columnfamily_name:column_key"];[[BR]][AND 
time="Specified_Timestamp"];[[BR]][LIMIT=Number_of_Version];''-~ ||
+ ||SELECT ||<99%>'''Select''' command will retrieves rows from a 
table.[[BR]][[BR]]~-''SELECT table_name[[BR]][WHERE row='row_key'][[BR]][AND 
column='columnfamily_name:column_key'];[[BR]][AND 
time='Specified_Timestamp'];[[BR]][LIMIT=Number_of_Version];''-~ ||
  
  
+ 
+ 
+ 
+ === Aggregation Functions ===
+ Generic one dimensional counting??
+ ||<bgcolor="#ececec">'''Functions''' ||<bgcolor="#ececec">'''Explanation''' ||
+ ||SUM ||<99%>'''SUM''' command will retrieves rows from a 
table.[[BR]][[BR]]~-''SELECT table_name[[BR]][WHERE row='row_key'][[BR]][AND 
column='columnfamily_name:column_key'];[[BR]][AND 
time='Specified_Timestamp'];[[BR]][LIMIT=Number_of_Version];''-~ ||
+ 
+ 
+ ...
+ ||<bgcolor="#ececec">'''Function''' ||<bgcolor="#ececec">'''Explanation''' ||
+ ||... ||<99%>... ||
+ 
+ 
+ The Matrix commands are used to store a 2D array of numerical data values. 
[[BR]]A number of routines are provided to manipulate the matrix object 
directly, illustrated below by simple examples.
+ 
+ '''Note''' that vectors should be defined as two-dimensional matrices to 
distinguish between row and column vectors [[BR]]in order to be able to perform 
matrix operations consistently.
+ 
+ === Matrix Construction Functions ===
+ ..
+ 
+ === Matrix Algebra Functions ===
+ ..
+ 
+ === Special functions ===
+ ..
+ 
+ ----
+ = Example Of Hbase Shell Use =
+ == Basic Usage ==
+ 
+ {{{
+ Hbase > CREATE movieLog_table 
+     --> 
COLUMNFAMILIES('year','length','inColor','studioName',vote','producer') 
+     --> limit=10;
+ 
+ }}}
  
  '''movieLog_table'''
  ||Row Key ||<-12>Column Families ||
  ||<rowbgcolor="#ececec">title   ||<-2> year ||<-2>length ||<-2>inColor ||<-2> 
studioName ||<-2> vote ||<-2> producer ||
- ||Star Wars ||year: || 1977 ||length: || 124 ||inColor: || true ||studioName: 
|| Fox || vote:''user_1'' || 5 || producer: || Rick McCallum ||
+ ||Star Wars ||year: || 1977 ||length: || 124 ||inColor: || true ||studioName: 
|| Fox || vote:''user_1'' || 5 || producer: || George Lucas ||
  || || || || || || || || || || vote:''user_2'' || 2 || || ||
- ||Mighty Ducks ||year: || 1991 ||length: || 104 ||inColor: || true 
||studioName: || Disney || vote:''user_1'' || 2 || producer: || Doug Claybourne 
||
+ ||Mighty Ducks ||year: || 1991 ||length: || 104 ||inColor: || true 
||studioName: || Disney || vote:''user_1'' || 2 || producer: || Blair Peters ||
  || || || || || || || || || || vote:''user_3'' || 4 || || ||
- ||Wayne's World ||year: || 1992 ||length: || 95 ||inColor: || true 
||studioName: || Paramount || vote:''user_2'' || 3 || producer: || Tom Keifer ||
+ ||Wayne's World ||year: || 1992 ||length: || 95 ||inColor: || true 
||studioName: || Paramount || vote:''user_2'' || 3 || producer: || Penelope 
Spheeris ||
  || || || || || || || || || || vote:''user_3'' || 4 || || ||
  
+ 
+ == Relation Algebra Operations ==
+ 
  '''Projection'''
  
  [http://mirror.udanax.org/~udanax/rsync1/blog_udanax_org/udanax/280/o_ex2.gif]
  
+ {{{
+ Hbase > A = SELECT movieLog_table;
+     --> B = A.Projection('year','length');
+ 
+ Hbase > PRINT B;
+ }}}
+ 
+ ||<rowbgcolor="#ececec">title ||year ||length ||
+ ||Star Wars ||1977 ||124 ||
+ ||Mighty Ducks ||1991 ||104 ||
+ ||Wayne's World ||1992 ||95 ||
+ 
+ 
+ 
  '''Selection'''
  
  [http://mirror.udanax.org/~udanax/rsync1/blog_udanax_org/udanax/280/o_ex3.gif]
  
+ {{{
+ Hbase > A = SELECT movieLog_table 
+     --> WHERE column='studioName:Fox';
+ Hbase > B = A.Filter by "length" > 100;
+ 
+ Hbase > PRINT B;
+ }}}
+ 
+ ||<rowbgcolor="#ececec">title ||year ||length ||inColor ||studioName 
||producer ||
+ ||Star Wars ||1977 ||124 ||true ||Fox ||12345 ||
+ ||Mighty Ducks ||1991 ||104 ||true ||Disney ||67890 ||
+ 
+ 
  '''Example'''
  
  [http://mirror.udanax.org/~udanax/rsync1/blog_udanax_org/udanax/280/o_ex4.gif]
  
+ {{{
+ Hbase > A = SELECT movieLog_table 
+     --> WHERE column='studioName:Fox';
+ Hbase > B = A.Filter by "length" > 100;
+ Hbase > C = B.Projection('year');
  
+ Hbase > PRINT C;
+ }}}
+ 
+ == Matrix Operations ==
+ {{{
- A = matrix(movieLog_table, vote);
+ Hbase > A = matrix('movieLog_table', 'vote');
+ 
+ Hbase > PRINT A;
+ }}}
  
  ||<rowbgcolor="#ececec"> ||user_1 ||user_2 ||user_3 ||
  ||<bgcolor="#ececec">Star Wars || 5 || 2 || 0 ||
  ||<bgcolor="#ececec">Mighty Ducks || 2 || 0 || 4 ||
  ||<bgcolor="#ececec">Wayne's World || 0 || 3 || 4 ||
  
- 
- writing..
- 
- 
- === Aggregation Functions ===
- 
- Generic one dimensional counting??
- 
- ||<#ececec> '''Functions''' ||<#ececec> '''Explanation''' ||
- ||SUM ||<99%>'''SUM''' command will retrieves rows from a 
table.[[BR]][[BR]]~-''SELECT table_name[[BR]][WHERE row="row_key"][[BR]][AND 
column="columnfamily_name:column_key"];[[BR]][AND 
time="Specified_Timestamp"];[[BR]][LIMIT=Number_of_Version];''-~ ||
- 
- ...
- 
- ||<#ececec> '''Function''' ||<#ececec> '''Explanation''' ||
- ||... ||<99%>... ||
- 
- The Matrix commands are used to store a 2D array of numerical data values.
- [[BR]]A number of routines are provided to manipulate the matrix object 
directly, illustrated below by simple examples.
- 
- '''Note''' that vectors should be defined as two-dimensional matrices to 
distinguish between row and column vectors 
- [[BR]]in order to be able to perform matrix operations consistently. 
- 
- === Matrix Construction Functions ===
- ..
- === Matrix Algebra Functions ===
- ..
- === Special functions ===
- ..
- 
- ----
- = Example Of Hbase Shell Use =
- ..
- == Basic Usage ==
- ..
- == Relation Algebra Operations ==
- ..
- == Matrix Operations ==
- ..
- 
  ----
  = Matrix Extension Example On Hbase Shell =
- ..
  == Latent Semantic Analysis By Singular Value Decomposition ==
- ..
+ '''Motivation'''
+ Lexical matching at term level inaccurate (claimed)
+ 
+   * Polysemy - words with number of ‘meanings’ - term matching returns 
irrelevant documents - impacts precision
+   * Synonomy - number of words with same ‘meaning’ - term matching misses 
relevant documents - impacts recall
+ 
+ LSA assumes that there exists a LATENT structure in word usage - obscured by 
variability in word choice 
+ [[BR]]Analogous to signal + additive noise model in signal processing
+ 
+ 
+ 
- == Scalable  Collaborative Filtering With A Large User-By-Item Matrix ==
+ == Scalable Collaborative Filtering With A Large User-By-Item Matrix ==
  ..
+ 
  == Consistency Assessment Of Topological Relationship By Matrix-Union ==
- .. 
+ ..
  ----
  = People Involved =
  

Reply via email to