Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by stack: http://wiki.apache.org/lucene-hadoop/Hbase/HbaseShell The comment on the change is: Split page in two -- current and future -- after chatting with Edward Yoon ------------------------------------------------------------------------------ - * https://issues.apache.org/jira/browse/HADOOP-1375 (resolved) - * Work in progress - [[TableOfContents(4)]] ---- - = Hbase Shell Introduction = + = HBase Shell Introduction = - Hbase Shell is an 'interpreter' (or 'shell)' to provide scalable data processing capabilities like [[BR]]aggregation, algebraic calculation on Hadoop + Hbase. + Hbase Shell is a basic, command-line, interactive 'shell' for manipulating tables in HBase. It has support for a small set of SQL-inspired operations. Results are presented in an ASCII-table format. + The HBase Shell aims to be to HBase what the mysql client command-line tool is to mysqld and sqlplus is to Oracle. - == Hbase Shell Goals == - HBase Shell is developed to achieve the following goals. + HBase Shell was first added to TRUNK in July, 2007. - * A Simplified Import/Export/Migrate Functionality Between different data sources (Hadoop, HBase) - * A Simplified processing of a logical data model - * A Simplified algebraic operations - * A Simplified Parallel Numerical Analysis by abstracting/numericalizing points, lines, [[BR]]or plane data across multiple maps in HBase. - - == Background == - - I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future. [[BR]]Moreover, i believe the design of the multi-dimensional structure and the 3-dim space model of the data are [[BR]]optimized for rapid ad-hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of [[BR]]raw data based on formulaic relationships. - - Then, I thought it would require a more user-friendly interface to enable querying the data interactive. - - == Rationale == - - It will probably take a while for Hadoop + HBase to provide reliable real-time service like other DBMS. - [[BR]]Thus, I decided to develop a shell to process linear algebraic computing - [[BR]]and large scale data using Hadoop's parallel processing and HBase storage. - - ''Then you may ask "What is a difference from MapReduce using MapFiles?"'' - - I don't expect it to give us a high-performance just yet, - [[BR]]but it will sure make data management and development much easier. - [[BR]]First, let's take a look at HBase's data model. - - HBase provides a unified data model and it represents a data in 3-dimensional - [[BR]]- Row, Column, and TImestamp. Also, Row and Column may be extended infinitely. - - If we decide to cut the data model in time version, then we may view the new data as a 2D table. - [[BR]]If index is in string, we may view it as a huge map. If index is in integer, then it is one huge 2D array. - - So each table may have such data storages in 3D (ColumnFamilies) - [[BR]]Locality Group(Columnfamilies) is a relationship that can occur between multiple references - [[BR]]whenever one reference brings in much of the data used by the other references. - - ''-- I hope physical files on networks are grouped together with locality grouping.[[BR]]by [:udanax:udanax].'' == People Involved == - * [:udanax:Edward Yoon] [EMAIL PROTECTED] (NHN corp.) + * [:udanax:Edward Yoon] [[MailTo(udanax AT SPAMFREE nhncorp DOT com)]] (NHN corp.) - * [:boyo:Sewon Kim] [EMAIL PROTECTED] (Empas, Inc.) - * [:mskim:Minsu Kim] [EMAIL PROTECTED] (Daum, Inc.) ---- - = Hbase Shell Client Syntax Definition = + = How to Start a Shell = + Run the following on the command-line: + + {{{${HBASE_HOME}/bin/hbase shell}}} + + You will be presented with the following prompt: + + {{{HBase Shell, 0.0.1 version. + Copyright (c) 2007 by udanax, licensed to Apache Software Foundation. + Type 'help;' for usage. + + HBase >}}} + + All commands are terminated with a semi-colon: e.g. Type 'help;' to see list of available commands. + + = Hbase Shell Commands = '''Note''' that Data should be located by their row, column, and timestamp. - - == Commands == ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' || ||Help ||<99%>'''Help''' command provides information about the use of shell script.[[BR]][[BR]]~-''HELP [function_name];''-~ || - ||Show ||<99%>'''Show''' command will list the tables.[[BR]][[BR]]~-''SHOW tables;''-~ || + ||Show ||<99%>'''Show''' command lists tables.[[BR]][[BR]]~-''SHOW tables;''-~ || - ||Describe ||'''Describe''' command will provides information about the columnfamilies in a table.[[BR]][[BR]]~-''DESC table_name;''-~ || + ||Describe ||'''Describe''' command provides information about the columnfamilies in a table.[[BR]][[BR]]~-''DESC table_name;''-~ || - ||Create ||'''Create''' command will create a new table.[[BR]][[BR]]~-''CREATE table_name[[BR]]COLUMNFAMILIES('columnfamily_name1'[, 'columnfamily_name2', ...])[[BR]][LIMIT=limitNumber_of_Version];''-~ || + ||Create ||'''Create''' command creates a new table.[[BR]][[BR]]~-''CREATE table_name[[BR]]COLUMNFAMILIES('columnfamily_name1'[, 'columnfamily_name2', ...])[[BR]][LIMIT=limitNumber_of_Version];''-~ || - ||Drop ||'''Drop''' command will droping columnfamilies in a table or tables.[[BR]][[BR]]~-''DROP table_name1[, table_name2, ...] or columnfamily_name1[, columnfamily_name2, ...];''-~ || + ||Drop ||'''Drop''' command drops columnfamilies in a table or tables.[[BR]][[BR]]~-''DROP table_name1[, table_name2, ...] or columnfamily_name1[, columnfamily_name2, ...];''-~ || - ||Substitute || '''Substitute''' expression to [A~Z][[BR]][[BR]]~-''X = Matrix(table_name, columnfamily_name);''-~|| - ||Store ||'''STORE''' command will store results to specified table. [[BR]][[BR]]~-''A = Table('movieLog_table'); [[BR]]B = A.Selection('length' > 100); [[BR]]STORE B TO X run_style;''-~ || ||Exit ||<99%>'''Exit''' from the current shell script.[[BR]][[BR]]~-''EXIT;''-~ || And, Commands to manually manipulate data on more detailed parts. ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' || - ||Insert ||<99%>'''Insert''' command will insert one row into the table with a value for specified column in the table.[[BR]][[BR]]~-''INSERT table_name ('columnfamily_name1:column_key'[, 'columnfamily_name2:column_key', ...])[[BR]] VALUESVALUES ('entry1'[, 'entry2', ...])[[BR]]WHERE row='row_key';''-~ || + ||Insert ||<99%>'''Insert''' command inserts one row into the table with a value for specified column in the table.[[BR]][[BR]]~-''INSERT table_name ('columnfamily_name1:column_key'[, 'columnfamily_name2:column_key', ...])[[BR]] VALUESVALUES ('entry1'[, 'entry2', ...])[[BR]]WHERE row='row_key';''-~ || - ||Set ||'''SET''' command will change the values. [[BR]][[BR]]~-''SET table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE row='row_key' AND time='Specified_Timestamp';''-~ || - ||Delete ||'''Delete''' command will delete specified rows in table. [[BR]][[BR]]~-''DELETE table_name[[BR]]WHERE row='row_key'[[BR]][AND column='columnfamily_name:column_key'];''-~ || + ||Delete ||'''Delete''' command deletes specified rows in table. [[BR]][[BR]]~-''DELETE table_name[[BR]]WHERE row='row_key'[[BR]][AND column='columnfamily_name:column_key'];''-~ || - ||Select ||<99%>'''Select''' command will retrieves rows from a table.[[BR]][[BR]]~-''SELECT table_name[[BR]][WHERE row='row_key'][[BR]][AND column='columnfamily_name:column_key'];[[BR]][AND time='Specified_Timestamp'];[[BR]][LIMIT=Number_of_Version];''-~ || + ||Select ||<99%>'''Select''' command retrieves rows from a table.[[BR]][[BR]]~-''SELECT table_name[[BR]][WHERE row='row_key'][[BR]][AND column='columnfamily_name:column_key'];[[BR]][AND time='Specified_Timestamp'];[[BR]][LIMIT=Number_of_Version];''-~ || - - == Relational Operators == - - ||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' || - ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new relation as the set that is obtained when all tuples(rows) in ~+R+~ are restricted to the set {columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Projection('year','length');''-~|| - ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as the set of specified tuples(rows) of the relation ~+R+~[[BR]]'''Set Operations''' : ~-''OR, AND, NOT''-~[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Selection('length' > 100);[[BR]]C = A.Selection('length' > 100 AND 'year' > 1979);''-~|| - ||Product ||<99%>'''Product''' of relations R and S, It makes a new relation as the set of all possible combinations of tuples of the two operation relations.[[BR]]'''NOTE''' that this is the most computationally expensive operator in the relational algebra.|| - ||Rename ||<99%>'''Rename''' r to x, The columnfamily names in the columnfamily-list replace the columnfamily names of the relation.[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Rename('length' = 'movieLength');''-~|| - ||Group ||<99%>'''Group''' tuples by value of an attribute and apply aggregate function independently to each group of tuples.[[BR]]'''Aggregate Functions''' : ~-''AVG( attribute ), SUM( attribute ), COUNT( attribute ), MIN( attribute ), MAX( attribute )''-~[[BR]][[BR]]~-''A = Table('movieLog_table);[[BR]]B = A.Group('studioName', MIN('year'));''-~|| - ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies on columnfamily-list[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Sort('length', 'vote');''-~|| - - == Matrix Operators == - - - * matrix operator - - ||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' || - ||Addition ||<99%>... || - ||subtraction ||<99%>... || - ||multiplication ||<99%>... || - ||division ||<99%>... || - ||transpose ||<99%>interchanging rows and columns || - ||permutation ||<99%>... || - ||norms ||<99%>... || - - * decompositions - - ||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' || - ||LU ||<99%>... || - ||QR ||<99%>... || - ||Cholesky ||<99%>... || - ||SVD ||<99%>... || - ||Inverse ||<99%>interchanging rows and columns || - ||Pseudoinverse ||<99%>... || - ||Condition ||<99%>... || - ||Determinant ||<99%>... || - ||Rank ||<99%>... || - ---- = Example Of Hbase Shell Use = - == Basic Usage == - - === Create the table in a HBase === + == Create the table in a HBase == ~-''CREATE movieLog_table [[BR]]COLUMNFAMILIES('year', 'length', 'inColor', 'studioName', 'vote', 'producer') @@ -127, +58 @@ [[BR]]COLUMNFAMILIES('biography', 'filmography', 'gender', 'birthDate') [[BR]]LIMIT=1;''-~ - === Insert data into a table === + == Insert data into a table == ~-''INSERT movieLog_table ('year:', 'length:', 'inColor:', 'studioName:', 'vote:user_1', 'producer') [[BR]]VALUES ('1977', '124', 'true', 'Fox', '5', 'George Lucas') [[BR]]WHERE row='Star Wars';''-~ @@ -138, +69 @@ [[BR]]WHERE row='Ewan Gordon Mc.Gregor';''-~ - === Show all data in a table === + == Show all data in a table == ~-''SELECT movieLog_table;''-~ ||Row Key ||<-12>Column Families || @@ -161, +92 @@ ||keanu reeves ||biography: ||blah~ ||filmography:Constantine ||starring ||gender: ||male ||birthDate: ||September 2, 1964|| || || || ||filmography:The Matrix Reloaded ||starring || || || || || - == Relation Operations == + = HBase Shell Plans = + The intent is add more support for non-interactive usage as well as operators to support algebraic, relational, and matrix manipulations. See [wiki:Hbase/ShellPlans ShellPlans] page for discussion and description of future operators. - === Projection === - - ~-''A = Table('movieLog_table'); - [[BR]]B = A.Projection('year','length');''-~ - - '''~+^Ï^+~'''~-title-~,~-year-~,~-length-~'''~+^(movieLog_table)^+~''' - - ||<rowbgcolor="#ececec">title ||year ||length || - ||Star Wars ||1977 ||124 || - ||Mighty Ducks ||1991 ||104 || - ||Wayne's World ||1992 ||95 || - - - - === Selection === - - ~-''A = Table('movieLog_table'); - [[BR]]B = A.Selection('length' > 100);''-~ - - '''~+^Ï^+~'''~-length>100-~'''~+^(movieLog_table)^+~''' - - ||<rowbgcolor="#ececec">title ||year ||length ||inColor ||studioName ||producer || - ||Star Wars ||1977 ||124 ||true ||Fox ||12345 || - ||Mighty Ducks ||1991 ||104 ||true ||Disney ||67890 || - - === Renaming === - - '''~+^ÏS^+~'''~-columnfamily-list-~'''~+^(movieLog_table)^+~''' - - === Groupping === - - '''~+^γ^+~'''~-columnfamily-list-~'''~+^(R)^+~''' - - === Sorting === - - '''~+^Ï^+~'''~-columnfamily-list-~'''~+^(R)^+~''' - - === Example === - - ~-''A = Table('movieLog_table'); - [[BR]]B = A.Selection(length > 100 AND studioName = 'Fox'); - [[BR]]C = B.Projection('year');''-~ - - '''~+^Ï^+~'''~-title-~,~-year-~'''~+^(Ï^+~'''~-length>100-~'''~+^(movieLog_table)â©Ï^+~'''~-studioName='Fox'-~'''~+^(movieLog_table))^+~''' - - ||<rowbgcolor="#ececec">title ||year || - ||Star Wars ||1977 || - - == Matrix Operations == - - Lets construct a abstract sparse row-by-column Map Matrix, orientation is row major. - - ~-''A = doubleMatrix('movieLog_table','vote');''-~ - - ||<rowbgcolor="#ececec"> ||user_1 ||user_2 ||user_3 || - ||<bgcolor="#ececec">Star Wars || 5.0 || 2.0 || || - ||<bgcolor="#ececec">Mighty Ducks || 2.0 || || 4.0 || - ||<bgcolor="#ececec">Wayne's World || || 3.0 || 4.0 || - - ---- - = Matrix Extension Example On Hbase Shell = - == Latent Semantic Analysis By Singular Value Decomposition == - '''Motivation''' - Lexical matching at term level inaccurate (claimed) - - * Polysemy - words with number of âmeaningsâ - term matching returns irrelevant documents - impacts precision - * Synonomy - number of words with same âmeaningâ - term matching misses relevant documents - impacts recall - - LSA assumes that there exists a LATENT structure in word usage - obscured by variability in word choice - [[BR]]Analogous to signal + additive noise model in signal processing - - - - == Scalable Collaborative Filtering With A Large User-By-Item Matrix == - - I will follow (Google Recommendation System) algorithms. - - [http://www2007.org/papers/paper570.pdf] - - - == Consistency Assessment Of Topological Relationship By Matrix-Union == - .. - - ---- - = Performance Reports = - .. -