Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by udanax: http://wiki.apache.org/lucene-hadoop/Hbase/ShellPlans ------------------------------------------------------------------------------ ---- = Hbase Shell Plan Draft = - Plan is to significantly expand the set of shell operators. Basic data manipulation and data definition operators will be extended and evolved to be more SQL-like ([:Hbase/HbaseShell/HQL HQL]). More sophisticated manipulations to do relational and linear algebra, matrix additions, multiplications, etc., will be added to a HBase subshell to keep the two operator types -- SQL-like vs. non-SQL -- distinct. + Plan is to significantly expand the set of shell operators. Basic data manipulation and data definition operators will be extended and evolved to be more SQL-like ([:Hbase/HbaseShell/HQL]). More sophisticated manipulations to do relational and linear algebra, matrix additions, multiplications, etc., will be added to a HBase subshell to keep the two operator types -- SQL-like vs. non-SQL -- distinct. This project is currently in the planning stage. [https://issues.apache.org/jira/browse/HADOOP-1608 HADOOP-1608] to add "Relational Algrebra Operators" is currently in process. == People Involved == * '''Syntax definition.''' * [:udanax:Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp. - * Inchul Song, Ph.D. Candidate[[BR]]Database Lab[[BR]]Division of Computer Science, KAIST + * Inchul Song, Ph.D. Candidate[[BR]]Database Lab (Division of Computer Science, KAIST) If you have constructive ideas, please advise me. [EMAIL PROTECTED] - ''~-This page looks great. I've added comments to the below. Please remove after you are done with them. -- St.Ack-~'' + == Suggested Hbase Query Language plans == - == Suggested Hbase Shell plans == - === Hbase Query Language === I've made some changes to your initial HQL to make it look more like SQL. I borrowed the syntax definition style from MySQL. + -- [:Hbase/HbaseShell/HQL] by Inchul Song - ''~-if you're ready to implement them, I suggest you to open a new issue for "HQL" -- Edward-~'' + ~-''If you're ready to implement them, I suggest you to open a new issue for "HQL" -- Edward''-~ ---- @@ -43, +42 @@ Hbase.altools > exit; Hbase > exit; }}} + - Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable data processing capabilities like aggregation, algebraic calculation(groups and sets, commutative rings, algebraic geometry, and linear algebra) on Hadoop + Hbase based parallel machines. especially, it will focus on storing and manipulating sparse matrices on Hbase. + Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable data processing capabilities like aggregation, algebraic calculation(groups and sets, commutative rings, algebraic geometry, and linear algebra) on Hadoop + Hbase based parallel machines. especially, it will focus on storing and manipulating '''sparse matrices''' on Hbase. ''-- Altools Matrix operations will show how Google search's LSI, Google Earth's algebraic topology, Google News' recommendation system are related to Bigtable. See the HBase Shell Usage Page. --[:Hbase/HbaseShell/Examples]'' + === Hbase altools Goals === * A Simplified Import/Export/Migrate Functionality Between different data sources (Hadoop, HBase) @@ -59, +60 @@ I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future. Moreover, i believe the design of the multi-dimensional map structure and the 3d space model of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of raw data based on formulaic relationships. It is advantageous with respect to '''Analysis Processing''' as it allows users to easily formulate complex queries, and filter or slice data into meaningful subsets, among other things. === Rationale === + It will probably take a while for Hadoop + HBase to provide reliable real-time service like other DBMS. [[BR]]Also, Multi Dimensional Model is commonly accepted for OLAP. ||<bgcolor="#E5E5E5">'''System Characteristic''' ||<bgcolor="#E5E5E5">'''RDBMS''' ||<bgcolor="#E5E5E5">'''Multi-Dimensional Model Hbase''' || ||Data Retrieval Perfomance ||Slow ||Fast || @@ -73, +75 @@ I don't expect it to give us a high-performance just yet, but it will sure make data management and development much easier. First, let's take a look at HBase's data model. HBase provides a unified data model and it represents a data in 3-dimensional - Row, Column, and TImestamp. Also, Row and Column may be extended infinitely. If we decide to cut the data model in time version, then we may view the new data as a 2D table. If index is in string, we may view it as a huge map. If index is in integer, then it is one huge 2D array. So each table may have such data storages in 3D (Columnfamilies) Locality Group(Columnfamilies) is a relationship that can occur between multiple references whenever one reference brings in much of the data used by the other references. - - ''~-I think people may also start to ask as your operators evolve: 'What is the difference between HBase Shell and Yahoo! PIG?' -- St.Ack-~'' ---- @@ -95, +95 @@ ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' || ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new relation as the set that is obtained when all tuples(rows) in ~+R+~ are restricted to the set {columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Projection('year','length');''-~ || ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as the set of specified tuples(rows) of the relation ~+R+~[[BR]]'''Set Operations''' : ~-''OR, AND, NOT''-~[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Selection(length > 100 AND studioName = 'Fox');''-~ || + ||JOINs ||<99%>Table '''JOIN''' operations, linking and extracting data from two different internal source[[BR]]'''Operations''' : ~-''naturalJoin(), thetaJoin(), cartesianProduct() ''-~ [[BR]][[BR]]~-''R = Table('movieLog_table');[[BR]]S = Table('movieStar_table');[[BR]]C = R.naturalJoin(S); //C = Râ·âS''-~ || ||Group ||<99%>'''Group''' tuples by value of an attribute and apply aggregate function independently to each group of tuples.[[BR]]'''Aggregate Functions''' : ~-''AVG( attribute ), SUM( attribute ), COUNT( attribute ), MIN( attribute ), MAX( attribute )''-~[[BR]][[BR]]~-''A = Table('movieLog_table);[[BR]]B = A.Group('studioName', MIN('year'));''-~ || ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies on columnfamily-list[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = Sort A by ('length');''-~ || @@ -142, +143 @@ St.Ack }}} + + ---- = Example Of Hbase Shell Use =