Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by udanax: http://wiki.apache.org/lucene-hadoop/Hbase/HbaseShell New page: '''research/work in progress''' * https://issues.apache.org/jira/browse/HADOOP-1375 [[BR]]but, implementation has yet to be started. [[TableOfContents(4)]] ---- = Hbase Shell Introduction = Hbase Shell is an 'interpreter' (or 'shell)' to provide scalable data processing capabilities like [[BR]]aggregation, algebraic calculation on Hadoop + Hbase. == Hbase Shell Goals == HBase Shell is developed to achieve the following goals. * A Simplified Import/Export/Migrate Functionality Between different data sources (Hadoop, HBase) * A Simplified processing of a logical data model * A Simplified algebraic operations * A Simplified Parallel Numerical Analysis by abstracting/numericalizing points, lines, [[BR]]or plane data across multiple maps in HBase. == Background == I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future. [[BR]]Moreover, i believe the design of the multi-dimensional structure and the 3-dim space model of the data are [[BR]]optimized for rapid ad-hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of [[BR]]raw data based on formulaic relationships. Then, I thought it would require a more user-friendly interface to enable querying the data interactive. == Rationale == It will probably take a while for Hadoop + HBase to provide reliable real-time service like other DBMS. [[BR]]Thus, I decided to develop a shell to process linear algebraic computing [[BR]]and large scale data using Hadoop's parallel processing and HBase storage. ''Then you may ask "What is a difference from MapReduce using MapFiles?"'' I don't expect it to give us a high-performance just yet, [[BR]]but it will sure make data management and development much easier. [[BR]]First, let's take a look at HBase's data model. HBase provides a unified data model and it represents a data in 3-dimensional [[BR]]- Row, Column, and TImestamp. Also, Row and Column may be extended infinitely. If we decide to cut the data model in time version, then we may view the new data as a 2D table. [[BR]]If index is in string, we may view it as a huge map. If index is in integer, then it is one huge 2D array. [[BR]]So each table may have such data storages in 3D (ColumnFamilies) ---- = Hbase Shell Client Syntax Definition = '''Note''' that Data should be located by their row, column, and timestamp. == Commands == ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' || ||HELP ||<99%>'''Help''' command provides information about the use of shell script.[[BR]][[BR]]~-''HELP [function_name];''-~ || ||SHOW ||<99%>'''Show''' command will list the tables.[[BR]][[BR]]~-''SHOW tables;''-~ || ||DESC ||'''Desc''' command will provides information about the columnfamilies in a table.[[BR]][[BR]]~-''DESC table_name;''-~ || ||CREATE ||'''Create''' command will create a new table.[[BR]][[BR]]~-''CREATE table_name[[BR]]COLUMNFAMILIES('columnfamily_name1'[, 'columnfamily_name2', ...])[[BR]][LIMIT=limitNumber_of_Version];''-~ || ||DROP ||'''Drop''' command will droping columnfamilies in a table or tables.[[BR]][[BR]]~-''DROP table_name1[, table_name2, ...] or columnfamily_name1[, columnfamily_name2, ...];''-~ || ||SUBSTITUTE[[BR]] || '''Substitute''' query to [A~Z][[BR]][[BR]]~-''X = Matrix(table_name, columnfamily_name);''-~|| ||STORE ||'''STORE''' command will store results to specified table. [[BR]][[BR]]~-''A = Table('movieLog_table'); [[BR]]B = A.Selection('length' > 100); [[BR]]STORE B TO X run_style;''-~ || ||EXIT ||<99%>'''Exit''' from the current shell script.[[BR]][[BR]]~-''EXIT;''-~ || And, Commands to manually manipulate data on more detailed parts. ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' || ||INSERT ||<99%>'''Insert''' command will insert one row into the table with a value for specified column in the table.[[BR]][[BR]]~-''INSERT table_name ('columnfamily_name1:column_key'[, 'columnfamily_name2:column_key', ...])[[BR]] VALUESVALUES ('entry1'[, 'entry2', ...])[[BR]]WHERE row='row_key';''-~ || ||SET ||'''SET''' command will change the values. [[BR]][[BR]]~-''SET table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE row='row_key' AND time='Specified_Timestamp';''-~ || ||DELETE ||'''Delete''' command will delete specified rows in table. [[BR]][[BR]]~-''DELETE table_name[[BR]]WHERE row='row_key'[[BR]][AND column='columnfamily_name:column_key'];''-~ || ||SELECT ||<99%>'''Select''' command will retrieves rows from a table.[[BR]][[BR]]~-''SELECT table_name[[BR]][WHERE row='row_key'][[BR]][AND column='columnfamily_name:column_key'];[[BR]][AND time='Specified_Timestamp'];[[BR]][LIMIT=Number_of_Version];''-~ || == Relational Operations == ||<bgcolor="#ececec">'''Operators''' ||<bgcolor="#ececec">'''Explanation''' || ||PROJECTION||<99%>is defined as the set that is obtained when all tuples in ~+R+~ are restricted to the set {a,,1,,,...,a,,n,,}.|| ||SELECTION||<99%>...|| ||PRODUCT||<99%>...|| ||RENAME||<99%>'''Rename''' r to x|| ||GROUP||<99%>...|| ||SORT||<99%>...|| ||<bgcolor="#ececec">'''Operators''' ||<bgcolor="#ececec">'''Explanation''' || ||UNION ||<99%>'''Union''' AâªB contains all the elements of A and it contains all the elements of B.|| ||INTERSECTION ||<99%>'''Intersection''' Aâ©B is a subset of A and it is a subset of B.|| ||DIFFERENCE ||'''Difference''' of A and B (A-B).|| ||<bgcolor="#ececec">'''Functions''' ||<bgcolor="#ececec">'''Explanation''' || ||AVG ||<99%>...|| ||SUM ||<99%>...|| ||COUNT ||<99%>...|| ||MIN ||<99%>...|| ||MAX ||<99%>...|| == Matrix Operations == ||<bgcolor="#ececec">'''Operation''' ||<bgcolor="#ececec">'''Explanation''' || ||DOUBLEMATRIX||<99%>...|| ||BOOLEANMATRIX||<99%>...|| ||<bgcolor="#ececec">'''Functions''' ||<bgcolor="#ececec">'''Explanation''' || ||QR ||<99%>...|| ||LU||<99%>...|| ||SVD ||<99%>...|| ---- = Example Of Hbase Shell Use = == Basic Usage == === Create the table in a HBase === ~-''CREATE movieLog_table [[BR]]COLUMNFAMILIES('year', 'length', 'inColor', 'studioName', 'vote', 'producer') [[BR]]LIMIT=1;''-~ === Insert data into a table === ~-''INSERT table_name ('year:', 'length:', 'inColor:', 'studioName:', 'vote:user_1', 'producer') [[BR]]VALUES ('1977', '124', 'true', 'Fox', '5', 'George Lucas') [[BR]]WHERE row='Star Wars';''-~ === Show all data in a table === ~-''SELECT movieLog_table;''-~ ||Row Key ||<-12>Column Families || ||<rowbgcolor="#ececec">title ||<-2> year ||<-2>length ||<-2>inColor ||<-2> studioName ||<-2> vote ||<-2> producer || ||Star Wars ||year: || 1977 ||length: || 124 ||inColor: || true ||studioName: || Fox || vote:''user_1'' || 5 || producer: || George Lucas || || || || || || || || || || || vote:''user_2'' || 2 || || || ||Mighty Ducks ||year: || 1991 ||length: || 104 ||inColor: || true ||studioName: || Disney || vote:''user_1'' || 2 || producer: || Blair Peters || || || || || || || || || || || vote:''user_3'' || 4 || || || ||Wayne's World ||year: || 1992 ||length: || 95 ||inColor: || true ||studioName: || Paramount || vote:''user_2'' || 3 || producer: || Penelope Spheeris || || || || || || || || || || || vote:''user_3'' || 4 || || || == Relation Operations == === Projection === ~-''A = Table('movieLog_table'); [[BR]]B = A.Projection('year','length');''-~ '''~+^Ï^+~'''~-title-~,~-year-~,~-length-~'''~+^(movieLog_table)^+~''' ||<rowbgcolor="#ececec">title ||year ||length || ||Star Wars ||1977 ||124 || ||Mighty Ducks ||1991 ||104 || ||Wayne's World ||1992 ||95 || === Selection === ~-''A = Table('movieLog_table'); [[BR]]B = A.Selection('length' > 100);''-~ '''~+^Ï^+~'''~-length>100-~'''~+^(movieLog_table)^+~''' ||<rowbgcolor="#ececec">title ||year ||length ||inColor ||studioName ||producer || ||Star Wars ||1977 ||124 ||true ||Fox ||12345 || ||Mighty Ducks ||1991 ||104 ||true ||Disney ||67890 || === Example === ~-''A = Table('movieLog_table'); [[BR]]B = A.Selection(length > 100 AND studioName = 'Fox'); [[BR]]C = B.Projection('year');''-~ '''~+^Ï^+~'''~-title-~,~-year-~'''~+^(Ï^+~'''~-length>100-~'''~+^(movieLog_table)â©Ï^+~'''~-studioName='Fox'-~'''~+^(movieLog_table))^+~''' ||<rowbgcolor="#ececec">title ||year || ||Star Wars ||1977 || == Matrix Operations == Lets construct a abstract sparse row-by-column matrix. ~-''A = doubleMatrix('movieLog_table','vote');''-~ ||<rowbgcolor="#ececec"> ||user_1 ||user_2 ||user_3 || ||<bgcolor="#ececec">Star Wars || 5.0 || 2.0 || || ||<bgcolor="#ececec">Mighty Ducks || 2.0 || || 4.0 || ||<bgcolor="#ececec">Wayne's World || || 3.0 || 4.0 || ---- = Matrix Extension Example On Hbase Shell = == Latent Semantic Analysis By Singular Value Decomposition == '''Motivation''' Lexical matching at term level inaccurate (claimed) * Polysemy - words with number of âmeaningsâ - term matching returns irrelevant documents - impacts precision * Synonomy - number of words with same âmeaningâ - term matching misses relevant documents - impacts recall LSA assumes that there exists a LATENT structure in word usage - obscured by variability in word choice [[BR]]Analogous to signal + additive noise model in signal processing == Scalable Collaborative Filtering With A Large User-By-Item Matrix == Title-By-Title Triangular Matrix ||<rowbgcolor="#ececec"> ||Star Wars ||Mighty Ducks ||Wayne's World || ||<bgcolor="#ececec">Star Wars || || 0.415 || 0.222 || ||<bgcolor="#ececec">Mighty Ducks || || || 0.715 || ||<bgcolor="#ececec">Wayne's World || || || || == Consistency Assessment Of Topological Relationship By Matrix-Union == .. ---- = Performance Reports = .. ---- = People Involved = * [:udanax:Edward Yoon] [EMAIL PROTECTED] * [:boyo:Sewon Kim] [EMAIL PROTECTED] * [:mskim:Minsu Kim] [EMAIL PROTECTED]