Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by udanax: http://wiki.apache.org/lucene-hadoop/HbaseShell The comment on the change is: Move to Hbase/HbaseShell ------------------------------------------------------------------------------ - '''research/work in progress''' + deleted - * https://issues.apache.org/jira/browse/HADOOP-1375 [[BR]]but, implementation has yet to be started. - - [[TableOfContents(4)]] - - ---- - = Hbase Shell Introduction = - Hbase Shell is an 'interpreter' (or 'shell)' to provide scalable data processing capabilities like [[BR]]aggregation, algebraic calculation on Hadoop + Hbase. - - == Hbase Shell Goals == - HBase Shell is developed to achieve the following goals. - - * A Simplified Import/Export/Migrate Functionality Between different data sources (Hadoop, HBase) - * A Simplified processing of a logical data model - * A Simplified algebraic operations - * A Simplified Parallel Numerical Analysis by abstracting/numericalizing points, lines, [[BR]]or plane data across multiple maps in HBase. - - == Background == - - I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future. [[BR]]Moreover, i believe the design of the multi-dimensional structure and the 3-dim space model of the data are [[BR]]optimized for rapid ad-hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of [[BR]]raw data based on formulaic relationships. - - Then, I thought it would require a more user-friendly interface to enable querying the data interactive. - - == Rationale == - - It will probably take a while for Hadoop + HBase to provide reliable real-time service like other DBMS. - [[BR]]Thus, I decided to develop a shell to process linear algebraic computing - [[BR]]and large scale data using Hadoop's parallel processing and HBase storage. - - ''Then you may ask "What is a difference from MapReduce using MapFiles?"'' - - I don't expect it to give us a high-performance just yet, - [[BR]]but it will sure make data management and development much easier. - [[BR]]First, let's take a look at HBase's data model. - - HBase provides a unified data model and it represents a data in 3-dimensional - [[BR]]- Row, Column, and TImestamp. Also, Row and Column may be extended infinitely. - - If we decide to cut the data model in time version, then we may view the new data as a 2D table. - [[BR]]If index is in string, we may view it as a huge map. If index is in integer, then it is one huge 2D array. - [[BR]]So each table may have such data storages in 3D (ColumnFamilies) - - - ---- - = Hbase Shell Client Syntax Definition = - '''Note''' that Data should be located by their row, column, and timestamp. - - == Commands == - ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' || - ||HELP ||<99%>'''Help''' command provides information about the use of shell script.[[BR]][[BR]]~-''HELP [function_name];''-~ || - ||SHOW ||<99%>'''Show''' command will list the tables.[[BR]][[BR]]~-''SHOW tables;''-~ || - ||DESC ||'''Desc''' command will provides information about the columnfamilies in a table.[[BR]][[BR]]~-''DESC table_name;''-~ || - ||CREATE ||'''Create''' command will create a new table.[[BR]][[BR]]~-''CREATE table_name[[BR]]COLUMNFAMILIES('columnfamily_name1'[, 'columnfamily_name2', ...])[[BR]][LIMIT=limitNumber_of_Version];''-~ || - ||DROP ||'''Drop''' command will droping columnfamilies in a table or tables.[[BR]][[BR]]~-''DROP table_name1[, table_name2, ...] or columnfamily_name1[, columnfamily_name2, ...];''-~ || - ||SUBSTITUTE[[BR]] || '''Substitute''' query to [A~Z][[BR]][[BR]]~-''X = Matrix(table_name, columnfamily_name);''-~|| - ||STORE ||'''STORE''' command will store results to specified table. [[BR]][[BR]]~-''A = Table('movieLog_table'); [[BR]]B = A.Selection('length' > 100); [[BR]]STORE B TO X run_style;''-~ || - ||EXIT ||<99%>'''Exit''' from the current shell script.[[BR]][[BR]]~-''EXIT;''-~ || - And, Commands to manually manipulate data on more detailed parts. - ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' || - ||INSERT ||<99%>'''Insert''' command will insert one row into the table with a value for specified column in the table.[[BR]][[BR]]~-''INSERT table_name ('columnfamily_name1:column_key'[, 'columnfamily_name2:column_key', ...])[[BR]] VALUESVALUES ('entry1'[, 'entry2', ...])[[BR]]WHERE row='row_key';''-~ || - ||SET ||'''SET''' command will change the values. [[BR]][[BR]]~-''SET table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE row='row_key' AND time='Specified_Timestamp';''-~ || - ||DELETE ||'''Delete''' command will delete specified rows in table. [[BR]][[BR]]~-''DELETE table_name[[BR]]WHERE row='row_key'[[BR]][AND column='columnfamily_name:column_key'];''-~ || - ||SELECT ||<99%>'''Select''' command will retrieves rows from a table.[[BR]][[BR]]~-''SELECT table_name[[BR]][WHERE row='row_key'][[BR]][AND column='columnfamily_name:column_key'];[[BR]][AND time='Specified_Timestamp'];[[BR]][LIMIT=Number_of_Version];''-~ || - - == Relational Operations == - - ||<bgcolor="#ececec">'''Operators''' ||<bgcolor="#ececec">'''Explanation''' || - ||PROJECTION||<99%>is defined as the set that is obtained when all tuples in ~+R+~ are restricted to the set {a,,1,,,...,a,,n,,}.|| - ||SELECTION||<99%>...|| - ||PRODUCT||<99%>...|| - ||RENAME||<99%>'''Rename''' r to x|| - ||GROUP||<99%>...|| - ||SORT||<99%>...|| - - - ||<bgcolor="#ececec">'''Operators''' ||<bgcolor="#ececec">'''Explanation''' || - ||UNION ||<99%>'''Union''' AâªB contains all the elements of A and it contains all the elements of B.|| - ||INTERSECTION ||<99%>'''Intersection''' Aâ©B is a subset of A and it is a subset of B.|| - ||DIFFERENCE ||'''Difference''' of A and B (A-B).|| - - ||<bgcolor="#ececec">'''Functions''' ||<bgcolor="#ececec">'''Explanation''' || - ||AVG ||<99%>...|| - ||SUM ||<99%>...|| - ||COUNT ||<99%>...|| - ||MIN ||<99%>...|| - ||MAX ||<99%>...|| - - == Matrix Operations == - - ||<bgcolor="#ececec">'''Operation''' ||<bgcolor="#ececec">'''Explanation''' || - ||DOUBLEMATRIX||<99%>...|| - ||BOOLEANMATRIX||<99%>...|| - - ||<bgcolor="#ececec">'''Functions''' ||<bgcolor="#ececec">'''Explanation''' || - ||QR ||<99%>...|| - ||LU||<99%>...|| - ||SVD ||<99%>...|| - - ---- - = Example Of Hbase Shell Use = - == Basic Usage == - - === Create the table in a HBase === - - ~-''CREATE movieLog_table - [[BR]]COLUMNFAMILIES('year', 'length', 'inColor', 'studioName', 'vote', 'producer') - [[BR]]LIMIT=1;''-~ - - === Insert data into a table === - ~-''INSERT table_name ('year:', 'length:', 'inColor:', 'studioName:', 'vote:user_1', 'producer') - [[BR]]VALUES ('1977', '124', 'true', 'Fox', '5', 'George Lucas') - [[BR]]WHERE row='Star Wars';''-~ - - === Show all data in a table === - ~-''SELECT movieLog_table;''-~ - - ||Row Key ||<-12>Column Families || - ||<rowbgcolor="#ececec">title ||<-2> year ||<-2>length ||<-2>inColor ||<-2> studioName ||<-2> vote ||<-2> producer || - ||Star Wars ||year: || 1977 ||length: || 124 ||inColor: || true ||studioName: || Fox || vote:''user_1'' || 5 || producer: || George Lucas || - || || || || || || || || || || vote:''user_2'' || 2 || || || - ||Mighty Ducks ||year: || 1991 ||length: || 104 ||inColor: || true ||studioName: || Disney || vote:''user_1'' || 2 || producer: || Blair Peters || - || || || || || || || || || || vote:''user_3'' || 4 || || || - ||Wayne's World ||year: || 1992 ||length: || 95 ||inColor: || true ||studioName: || Paramount || vote:''user_2'' || 3 || producer: || Penelope Spheeris || - || || || || || || || || || || vote:''user_3'' || 4 || || || - - - == Relation Operations == - - === Projection === - - ~-''A = Table('movieLog_table'); - [[BR]]B = A.Projection('year','length');''-~ - - '''~+^Ï^+~'''~-title-~,~-year-~,~-length-~'''~+^(movieLog_table)^+~''' - - ||<rowbgcolor="#ececec">title ||year ||length || - ||Star Wars ||1977 ||124 || - ||Mighty Ducks ||1991 ||104 || - ||Wayne's World ||1992 ||95 || - - - - === Selection === - - ~-''A = Table('movieLog_table'); - [[BR]]B = A.Selection('length' > 100);''-~ - - '''~+^Ï^+~'''~-length>100-~'''~+^(movieLog_table)^+~''' - - ||<rowbgcolor="#ececec">title ||year ||length ||inColor ||studioName ||producer || - ||Star Wars ||1977 ||124 ||true ||Fox ||12345 || - ||Mighty Ducks ||1991 ||104 ||true ||Disney ||67890 || - - - === Example === - - ~-''A = Table('movieLog_table'); - [[BR]]B = A.Selection(length > 100 AND studioName = 'Fox'); - [[BR]]C = B.Projection('year');''-~ - - '''~+^Ï^+~'''~-title-~,~-year-~'''~+^(Ï^+~'''~-length>100-~'''~+^(movieLog_table)â©Ï^+~'''~-studioName='Fox'-~'''~+^(movieLog_table))^+~''' - - ||<rowbgcolor="#ececec">title ||year || - ||Star Wars ||1977 || - - == Matrix Operations == - - Lets construct a abstract sparse row-by-column matrix. - - ~-''A = doubleMatrix('movieLog_table','vote');''-~ - - ||<rowbgcolor="#ececec"> ||user_1 ||user_2 ||user_3 || - ||<bgcolor="#ececec">Star Wars || 5.0 || 2.0 || || - ||<bgcolor="#ececec">Mighty Ducks || 2.0 || || 4.0 || - ||<bgcolor="#ececec">Wayne's World || || 3.0 || 4.0 || - - - ---- - = Matrix Extension Example On Hbase Shell = - == Latent Semantic Analysis By Singular Value Decomposition == - '''Motivation''' - Lexical matching at term level inaccurate (claimed) - - * Polysemy - words with number of âmeaningsâ - term matching returns irrelevant documents - impacts precision - * Synonomy - number of words with same âmeaningâ - term matching misses relevant documents - impacts recall - - LSA assumes that there exists a LATENT structure in word usage - obscured by variability in word choice - [[BR]]Analogous to signal + additive noise model in signal processing - - - - == Scalable Collaborative Filtering With A Large User-By-Item Matrix == - - Title-By-Title Triangular Matrix - - ||<rowbgcolor="#ececec"> ||Star Wars ||Mighty Ducks ||Wayne's World || - ||<bgcolor="#ececec">Star Wars || || 0.415 || 0.222 || - ||<bgcolor="#ececec">Mighty Ducks || || || 0.715 || - ||<bgcolor="#ececec">Wayne's World || || || || - - - == Consistency Assessment Of Topological Relationship By Matrix-Union == - .. - - ---- - = Performance Reports = - .. - - ---- - = People Involved = - - * [:udanax:Edward Yoon] [EMAIL PROTECTED] - * [:boyo:Sewon Kim] [EMAIL PROTECTED] - * [:mskim:Minsu Kim] [EMAIL PROTECTED] -