Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by stack: http://wiki.apache.org/lucene-hadoop/Hbase/ShellPlans The comment on the change is: Page two of hbase shell split (after chatting with Edward Yoon) New page: * Work in progress [[TableOfContents(4)]] --- = Introduction = A basic version of an [wiki:Hbase/HbaseShell HBase Shell] was added to HBase in July, 2007. This page discusses future HBase Shell features and directions. = Hbase Shell Goals = * A Simplified Import/Export/Migrate Functionality Between different data sources (Hadoop, HBase) * A Simplified processing of a logical data model * A Simplified algebraic operations * A Simplified Parallel Numerical Analysis by abstracting/numericalizing points, lines, [[BR]]or plane data across multiple maps in HBase. == HBase Shell Background == I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future. Moreover, i believe the design of the multi-dimensional structure and the 3-dim space model of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of raw data based on formulaic relationships. Then, I thought it would require a more user-friendly interface to enable querying the data interactive. === Rationale === It will probably take a while for Hadoop + HBase to provide reliable real-time service like other DBMS. Thus, I decided to develop a shell to process linear algebraic computing and large scale data using Hadoop's parallel processing and HBase storage. ''Then you may ask "What is a difference from MapReduce using MapFiles?"'' I don't expect it to give us a high-performance just yet, but it will sure make data management and development much easier. First, let's take a look at HBase's data model. HBase provides a unified data model and it represents a data in 3-dimensional - Row, Column, and TImestamp. Also, Row and Column may be extended infinitely. If we decide to cut the data model in time version, then we may view the new data as a 2D table. If index is in string, we may view it as a huge map. If index is in integer, then it is one huge 2D array. So each table may have such data storages in 3D (ColumnFamilies) Locality Group(Columnfamilies) is a relationship that can occur between multiple references whenever one reference brings in much of the data used by the other references. ''-- I hope physical files on networks are grouped together with locality grouping.[[BR]]by [:udanax:udanax].'' == People Involved == * [:udanax:Edward Yoon] [[MailTo(udanax AT SPAMFREE nhncorp DOT com)]] (NHN corp.) * [:boyo:Sewon Kim] [[MailTo(ebow31 AT SPAMFREE gmail.com)]] (Empas, Inc.) * [:mskim:Minsu Kim] [[MailTo(minsu.kim AT SPAMFREE gmail.com)]] (Daum, Inc.) ---- = Suggested Future Hbase Shell Operators = '''Note''' that Data should be located by their row, column, and timestamp. == Commands == ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' || ||Substitute || '''Substitute''' expression to [A~Z][[BR]][[BR]]~-''X = Matrix(table_name, columnfamily_name);''-~|| ||Store ||'''STORE''' command will store results to specified table. [[BR]][[BR]]~-''A = Table('movieLog_table'); [[BR]]B = A.Selection('length' > 100); [[BR]]STORE B TO X run_style;''-~ || ||Set ||'''SET''' command will change the values. [[BR]][[BR]]~-''SET table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE row='row_key' AND time='Specified_Timestamp';''-~ || == Relational Operators == ||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' || ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new relation as the set that is obtained when all tuples(rows) in ~+R+~ are restricted to the set {columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Projection('year','length');''-~|| ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as the set of specified tuples(rows) of the relation ~+R+~[[BR]]'''Set Operations''' : ~-''OR, AND, NOT''-~[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Selection('length' > 100);[[BR]]C = A.Selection('length' > 100 AND 'year' > 1979);''-~|| ||Product ||<99%>'''Product''' of relations R and S, It makes a new relation as the set of all possible combinations of tuples of the two operation relations.[[BR]]'''NOTE''' that this is the most computationally expensive operator in the relational algebra.|| ||Rename ||<99%>'''Rename''' r to x, The columnfamily names in the columnfamily-list replace the columnfamily names of the relation.[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Rename('length' = 'movieLength');''-~|| ||Group ||<99%>'''Group''' tuples by value of an attribute and apply aggregate function independently to each group of tuples.[[BR]]'''Aggregate Functions''' : ~-''AVG( attribute ), SUM( attribute ), COUNT( attribute ), MIN( attribute ), MAX( attribute )''-~[[BR]][[BR]]~-''A = Table('movieLog_table);[[BR]]B = A.Group('studioName', MIN('year'));''-~|| ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies on columnfamily-list[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Sort('length', 'vote');''-~|| == Matrix Operators == * matrix operator ||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' || ||Addition ||<99%>... || ||subtraction ||<99%>... || ||multiplication ||<99%>... || ||division ||<99%>... || ||transpose ||<99%>interchanging rows and columns || ||permutation ||<99%>... || ||norms ||<99%>... || * decompositions ||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' || ||LU ||<99%>... || ||QR ||<99%>... || ||Cholesky ||<99%>... || ||SVD ||<99%>... || ||Inverse ||<99%>interchanging rows and columns || ||Pseudoinverse ||<99%>... || ||Condition ||<99%>... || ||Determinant ||<99%>... || ||Rank ||<99%>... ||