Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by udanax: http://wiki.apache.org/lucene-hadoop/HbaseShell ------------------------------------------------------------------------------ - '''work in progress''' + '''research/work in progress''' - https://issues.apache.org/jira/browse/HADOOP-1375 [[TableOfContents(4)]] + ---- = Hbase Shell Introduction = - - Hbase Shell is an 'interpreter' (or 'shell)' to provide scalable data processing capabilities like + Hbase Shell is an 'interpreter' (or 'shell)' to provide scalable data processing capabilities like [[BR]]aggregation, algebraic calculation on Hadoop + Hbase. - [[BR]]aggregation, algebraic calculation on Hadoop + Hbase. == Hbase Shell Goals == - HBase Shell is developed to achieve the following goals. - * Generic Query Model Functions * A Simplified Import/Export/Migrate Functionality Between different data sources (Hadoop, HBase) * A Simplified processing of a logical data model * A Simplified algebraic operations - * Parallel Numerical Analysis by abstracting/numericalizing points, lines, or plane data across multiple maps in HBase. + * A Simplified Parallel Numerical Analysis by abstracting/numericalizing points, lines, [[BR]]or plane data across multiple maps in HBase. == Background == + I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future. [[BR]]Moreover, i believe the design of the multi-dimensional structure and the 3-dim space model of the data are [[BR]]optimized for rapid ad-hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of [[BR]]raw data based on formulaic relationships. - Then, I thought it would require a more user-friendly interface to enable querying the data interactive. + Then, I thought it would require a more user-friendly interface to enable querying the data interactive. == Rationale == - ... + + It will probably take a while for Hadoop + HBase to provide reliable real-time service like other DBMS. + [[BR]]Thus, I decided to develop a shell to process linear algebraic computing + [[BR]]and large scale data using Hadoop's parallel processing and HBase storage. + + ''Then you may ask "What is a difference from MapReduce using MapFiles?"'' + + I don't expect it to give us a high-performance just yet, + [[BR]]but it will sure make data management and development much easier. + [[BR]]First, let's take a look at HBase's data model. + + HBase provides a unified data model and it represents a data in 3-dimensional + [[BR]]- Row, Column, and TImestamp. Also, Row and Column may be extended infinitely. + + If we decide to cut the data model in time version, then we may view the new data as a 2D table. + [[BR]]If index is in string, we may view it as a huge map. If index is in integer, then it is one huge 2D array. + [[BR]]So each table may have such data storages in 3D (ColumnFamilies) + ---- = Hbase Shell Syntax Definition = - '''Note''' that Data should be located by their row, column, and timestamp. == Basic Commands == - - ||<#ececec> '''Command''' ||<#ececec> '''Explanation''' || + ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' || ||HELP ||<99%>'''Help''' command provides information about the use of shell script.[[BR]][[BR]]~-''HELP [function_name];''-~ || ||SHOW ||<99%>'''Show''' command will list the tables.[[BR]][[BR]]~-''SHOW tables;''-~ || ||DESC ||'''Desc''' command will provides information about the columnfamilies in a table.[[BR]][[BR]]~-''DESC table_name;''-~ || ||CREATE ||'''Create''' command will create a new table.[[BR]][[BR]]~-''CREATE table_name[[BR]]COLUMNFAMILIES('columnfamily_name1'[, 'columnfamily_name2', ...])[[BR]]LIMIT=limitNumber_of_Version;''-~ || ||DROP ||'''Drop''' command will droping columnfamilies in a table or tables.[[BR]][[BR]]~-''DROP table_name1[, table_name2, ...] or columnfamily_name1[, columnfamily_name2, ...];''-~ || + ||SUBSTITUTE[[BR]] || '''Substitute''' query to [A~Z][[BR]][[BR]]~-''X = SELECT table_name;''-~|| - ||PRINT ||'''Print''' command will print a results to the console output. [[BR]][[BR]]~-''A = array([1, 2, 3]);[[BR]]PRINT A;[[BR]]B = SELECT table_name WHERE row="row_key";[[BR]]PRINT B;''-~|| + ||PRINT ||'''Print''' command will print a results to the console output. [[BR]][[BR]]~-''A = array([1, 2, 3]);[[BR]]PRINT A;[[BR]]B = SELECT table_name WHERE row='row_key';[[BR]]PRINT B;''-~ || - ||STORE ||'''STORE''' command will store results to specified table. [[BR]][[BR]]~-''M = matrix('table_name','columnfamily_name');[[BR]]A = array([[1, 2],[3, 4]]); //In this case, Key should be an integer index. [[BR]]STORE A TO M run_style;[[BR]]B = SELECT table_name WHERE row="row_key";[[BR]]STORE B TO ('table_name','columnfamily_name1'[, 'columnfamily_name2']) run_style;''-~|| + ||STORE ||'''STORE''' command will store results to specified table. [[BR]][[BR]]~-''M = matrix('table_name','columnfamily_name');[[BR]]A = array([[1, 2],[3, 4]]); //In this case, Key should be an integer index. [[BR]]STORE A TO M run_style;[[BR]]B = SELECT table_name WHERE row='row_key';[[BR]]STORE B TO ('table_name','columnfamily_name1'[, 'columnfamily_name2']) run_style;''-~ || ||EXIT ||<99%>'''Exit''' from the current shell script.[[BR]][[BR]]~-''EXIT;''-~ || - And, Commands to manually manipulate data on more detailed parts. - - ||<#ececec> '''Command''' ||<#ececec> '''Explanation''' || + ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' || - ||INSERT ||<99%>'''Insert''' command will insert one row into the table with a value for specified column in the table.[[BR]][[BR]]~-''INSERT table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE row="row_key";''-~ || + ||INSERT ||<99%>'''Insert''' command will insert one row into the table with a value for specified column in the table.[[BR]][[BR]]~-''INSERT table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE row='row_key';''-~ || - ||SET ||'''SET''' command will change the values. [[BR]][[BR]]~-''SET table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE row="row_key" AND time="Specified_Timestamp";''-~|| + ||SET ||'''SET''' command will change the values. [[BR]][[BR]]~-''SET table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE row='row_key' AND time='Specified_Timestamp';''-~ || - ||DELETE ||'''Delete''' command will delete specified rows in table. [[BR]][[BR]]~-''DELETE table_name[[BR]]WHERE row="row_key"[[BR]][AND column="columnfamily_name:column_key"];''-~|| + ||DELETE ||'''Delete''' command will delete specified rows in table. [[BR]][[BR]]~-''DELETE table_name[[BR]]WHERE row='row_key'[[BR]][AND column='columnfamily_name:column_key'];''-~ || === Relational Algebra Operators === - - ||<#ececec> '''Command''' ||<#ececec> '''Explanation''' || + ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' || - ||SELECT ||<99%>'''Select''' command will retrieves rows from a table.[[BR]][[BR]]~-''SELECT table_name[[BR]][WHERE row="row_key"][[BR]][AND column="columnfamily_name:column_key"];[[BR]][AND time="Specified_Timestamp"];[[BR]][LIMIT=Number_of_Version];''-~ || + ||SELECT ||<99%>'''Select''' command will retrieves rows from a table.[[BR]][[BR]]~-''SELECT table_name[[BR]][WHERE row='row_key'][[BR]][AND column='columnfamily_name:column_key'];[[BR]][AND time='Specified_Timestamp'];[[BR]][LIMIT=Number_of_Version];''-~ || + + + + === Aggregation Functions === + Generic one dimensional counting?? + ||<bgcolor="#ececec">'''Functions''' ||<bgcolor="#ececec">'''Explanation''' || + ||SUM ||<99%>'''SUM''' command will retrieves rows from a table.[[BR]][[BR]]~-''SELECT table_name[[BR]][WHERE row='row_key'][[BR]][AND column='columnfamily_name:column_key'];[[BR]][AND time='Specified_Timestamp'];[[BR]][LIMIT=Number_of_Version];''-~ || + + + ... + ||<bgcolor="#ececec">'''Function''' ||<bgcolor="#ececec">'''Explanation''' || + ||... ||<99%>... || + + + The Matrix commands are used to store a 2D array of numerical data values. [[BR]]A number of routines are provided to manipulate the matrix object directly, illustrated below by simple examples. + + '''Note''' that vectors should be defined as two-dimensional matrices to distinguish between row and column vectors [[BR]]in order to be able to perform matrix operations consistently. + + === Matrix Construction Functions === + .. + + === Matrix Algebra Functions === + .. + + === Special functions === + .. + + ---- + = Example Of Hbase Shell Use = + == Basic Usage == + + {{{ + Hbase > CREATE movieLog_table + --> COLUMNFAMILIES('year','length','inColor','studioName',vote','producer') + --> limit=10; + + }}} '''movieLog_table''' ||Row Key ||<-12>Column Families || ||<rowbgcolor="#ececec">title ||<-2> year ||<-2>length ||<-2>inColor ||<-2> studioName ||<-2> vote ||<-2> producer || - ||Star Wars ||year: || 1977 ||length: || 124 ||inColor: || true ||studioName: || Fox || vote:''user_1'' || 5 || producer: || Rick McCallum || + ||Star Wars ||year: || 1977 ||length: || 124 ||inColor: || true ||studioName: || Fox || vote:''user_1'' || 5 || producer: || George Lucas || || || || || || || || || || || vote:''user_2'' || 2 || || || - ||Mighty Ducks ||year: || 1991 ||length: || 104 ||inColor: || true ||studioName: || Disney || vote:''user_1'' || 2 || producer: || Doug Claybourne || + ||Mighty Ducks ||year: || 1991 ||length: || 104 ||inColor: || true ||studioName: || Disney || vote:''user_1'' || 2 || producer: || Blair Peters || || || || || || || || || || || vote:''user_3'' || 4 || || || - ||Wayne's World ||year: || 1992 ||length: || 95 ||inColor: || true ||studioName: || Paramount || vote:''user_2'' || 3 || producer: || Tom Keifer || + ||Wayne's World ||year: || 1992 ||length: || 95 ||inColor: || true ||studioName: || Paramount || vote:''user_2'' || 3 || producer: || Penelope Spheeris || || || || || || || || || || || vote:''user_3'' || 4 || || || + + == Relation Algebra Operations == + '''Projection''' [http://mirror.udanax.org/~udanax/rsync1/blog_udanax_org/udanax/280/o_ex2.gif] + {{{ + Hbase > A = SELECT movieLog_table; + --> B = A.Projection('year','length'); + + Hbase > PRINT B; + }}} + + ||<rowbgcolor="#ececec">title ||year ||length || + ||Star Wars ||1977 ||124 || + ||Mighty Ducks ||1991 ||104 || + ||Wayne's World ||1992 ||95 || + + + '''Selection''' [http://mirror.udanax.org/~udanax/rsync1/blog_udanax_org/udanax/280/o_ex3.gif] + {{{ + Hbase > A = SELECT movieLog_table + --> WHERE column='studioName:Fox'; + Hbase > B = A.Filter by "length" > 100; + + Hbase > PRINT B; + }}} + + ||<rowbgcolor="#ececec">title ||year ||length ||inColor ||studioName ||producer || + ||Star Wars ||1977 ||124 ||true ||Fox ||12345 || + ||Mighty Ducks ||1991 ||104 ||true ||Disney ||67890 || + + '''Example''' [http://mirror.udanax.org/~udanax/rsync1/blog_udanax_org/udanax/280/o_ex4.gif] + {{{ + Hbase > A = SELECT movieLog_table + --> WHERE column='studioName:Fox'; + Hbase > B = A.Filter by "length" > 100; + Hbase > C = B.Projection('year'); + Hbase > PRINT C; + }}} + + == Matrix Operations == + {{{ - A = matrix(movieLog_table, vote); + Hbase > A = matrix('movieLog_table', 'vote'); + + Hbase > PRINT A; + }}} ||<rowbgcolor="#ececec"> ||user_1 ||user_2 ||user_3 || ||<bgcolor="#ececec">Star Wars || 5 || 2 || 0 || ||<bgcolor="#ececec">Mighty Ducks || 2 || 0 || 4 || ||<bgcolor="#ececec">Wayne's World || 0 || 3 || 4 || - - writing.. - - - === Aggregation Functions === - - Generic one dimensional counting?? - - ||<#ececec> '''Functions''' ||<#ececec> '''Explanation''' || - ||SUM ||<99%>'''SUM''' command will retrieves rows from a table.[[BR]][[BR]]~-''SELECT table_name[[BR]][WHERE row="row_key"][[BR]][AND column="columnfamily_name:column_key"];[[BR]][AND time="Specified_Timestamp"];[[BR]][LIMIT=Number_of_Version];''-~ || - - ... - - ||<#ececec> '''Function''' ||<#ececec> '''Explanation''' || - ||... ||<99%>... || - - The Matrix commands are used to store a 2D array of numerical data values. - [[BR]]A number of routines are provided to manipulate the matrix object directly, illustrated below by simple examples. - - '''Note''' that vectors should be defined as two-dimensional matrices to distinguish between row and column vectors - [[BR]]in order to be able to perform matrix operations consistently. - - === Matrix Construction Functions === - .. - === Matrix Algebra Functions === - .. - === Special functions === - .. - - ---- - = Example Of Hbase Shell Use = - .. - == Basic Usage == - .. - == Relation Algebra Operations == - .. - == Matrix Operations == - .. - ---- = Matrix Extension Example On Hbase Shell = - .. == Latent Semantic Analysis By Singular Value Decomposition == - .. + '''Motivation''' + Lexical matching at term level inaccurate (claimed) + + * Polysemy - words with number of âmeaningsâ - term matching returns irrelevant documents - impacts precision + * Synonomy - number of words with same âmeaningâ - term matching misses relevant documents - impacts recall + + LSA assumes that there exists a LATENT structure in word usage - obscured by variability in word choice + [[BR]]Analogous to signal + additive noise model in signal processing + + + - == Scalable Collaborative Filtering With A Large User-By-Item Matrix == + == Scalable Collaborative Filtering With A Large User-By-Item Matrix == .. + == Consistency Assessment Of Topological Relationship By Matrix-Union == - .. + .. ---- = People Involved =