[jira] [Updated] (HADOOP-13304) distributed database for store , mapreduce for compute

jiang hehui (JIRA) Tue, 21 Jun 2016 02:14:25 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-13304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


jiang hehui updated HADOOP-13304:
---------------------------------
    Description: 
in hadoop ,hdfs is responsible for store , mapreduce is responsible for compute 
.
my idea is that data are stored in distributed database , data compute is like 
mapreduce.

h2. how to do ?

* insert: 
using two-phase commit ,according to the split policy ,just execute insert in 
nodes

* delete: 
using two-phase commit ,according to the split policy ,just execute delete in 
nodes

* update:
using two-phase commit, according to the split policy, if record node does not 
change ,just execute update in nodes, if record node change, first delete old 
value in source node , and insert new value in destination node .
* select:
** simple select (like data just in one node , or data fusion across multi 
nodes not need)is just the same like standalone database server;
** complex select (like distinct , group by, order by, sub query, join across 
multi nodes),we call a job 
{panel}
{color:red}job are parsed into stages , stages have lineage , all stages in a 
job make up dag( Directed Acyclic Graph ) ,every stage contains mapsql 
,shuffle, reducesql .
when receive request sql, according to metadata ,generate the execution plan 
which contain the dag , including stage and mapsql ,shuffle, reducesql in each 
stage; then just execute the plan , and return result to client.

as in spark , it is the same ; rdd is table , job is job;
as mapreduce in hadoop, it is the same ; mapsql is map , shuffle is shuffle , 
reducesql is reduce.
{color}
{panel}


h2. architecture:
!http://images2015.cnblogs.com/blog/439702/201606/439702-20160621124133334-32823985.png!

* client : user interface 
* master : master like nameserver in hdfs
* meta database : contain the base information about system , nodes , tables 
and so on 
* store node :  database node where data is stored , insert,delete,update is 
always executed on 
* calculate node ： where execure select , source data is in store nodes , then 
other task is run on calculate node . calculae node may be the same as store 
node in practice

h2. Feature & Advantage

{panel}
{color:green}
data is stored in some nodes , split by field values ,with one policy;
this feature is very useful in full-text indexing (like solr ,elastic search);
{color}
{panel}

{panel}
{color:green}
contrast with hdfs , data location can be get in your mind ,not get by execute 
some command;
{color}
{panel}

{panel}
{color:green}
when insert, update, delete multi record, xa transaction can se used to support 
consistence;
{color}
{panel}

{panel}
{color:green}
we know that random read/write is not supported in hdfs, so update, delete is 
difficult , and insert are batch normally;
{color}
{panel}

{panel}
{color:green}
so data store in database have big advantage
sql across multi nodes are supported , including group by , order by , having, 
specially sub query and join
data is stored in database ,so index can speed our query and data can be cached 
in memory automatically ;
{color}
{panel}

{panel}
{color:green}
we can get a little records from billon-level records very quickly using index 
and cache, when using hadoop is very slow.
when using hadoop, data from online database to offline data warehouse is hard 
for update and delete, and delayed because of data merge;
{color}
{panel}

{panel}
{color:green}
if we use database to store data , data sync is very simple and real-time, just 
use replication , all issues are resolved
you can see, online and offline can both use this system , 
refer to application architecture(online & offline)
{color}
{panel}


h2. example (about group by):

{panel}

sql :
select age,count(u_id) v from tab_user_info t where u_reg_dt>=? and u_reg_dt<=? 
group by age


execution plan may be:
stage0:
{quote}
        mapsql:
                select age,count(u_id) v from tab_user_info t where u_reg_dt>=? 
and u_reg_dt<=? group by age

        shuffle: 
                shuffle by age with range policy, 
                for example ,if number of reduce node is N , then every node 
has (max(u_id)-min(u_id))/N record , 
                reduce node have id , node with small id store data with small 
range of age , so we can group by in each node
                
        reducesql: 
                select age,sum(v) from t where group by age
{quote}
note:
we must execute group by on reduce node because of data coming from different 
mapsql need to be aggregated



{panel}

h2. example (about join):

{panel}
sql:
select t1.u_id,t1.u_name,t2.login_product 
from tab_user_info t1 join tab_login_info t2 
on (t1.u_id=t2.u_id and t1.u_reg_dt>=? and t1.u_reg_dt<=?)

execution plan may be:

stage0:
{quote}
        mapsql:
                select u_id,u_name from tab_user_info t where u_reg_dt>=? and 
t1.u_reg_dt<=? ;
                select u_id, login_product from tab_login_info t ;


        shuffle: 
                shuffle by u_id with range policy, 
                for example ,if number of reduce node is N , then every node 
has (max(u_id)-min(u_id))/N record , 
                reduce node have id , node with small id store data with small 
range of u_id , so we can join in each node
                
        reducesql: 

                select t1.u_id,t1.u_name,t2.login_product 
                from tab_user_info t1 join tab_login_info t2 
                on (t1.u_id=t2.u_id)
{quote}

note:
because of join ,each table need to be tagged so that reduce can determine each 
record belongs to which table


{panel}

h2. example (about group by & order by & join):

{panel}
sql:
select t1.id1 , t2.id1 , sum(t1.c1) v1 ,sum(t2.c1) v2 from t1 join t2 on (t1.cc 
= t2.cc and t1.cc1=? and t2.cc2=?) group by t1.id1 , t2.id1 order by v1,v2

this job need multi stage :
        * first execute join 
        * base on first result , execute group by
        * base on second result , execute order by

{panel}


h2.  how to execute any sql ?

{panel}
first parse the subquery and union , each part is a select (not subquery and 
union) in the tree;
for each part, parse and execute order may be as below:
        # from (include join ,multi table)
        # join condition & where 
        # group by (check result columns)
        # having after group by if exist
        # result column(include distinct)
        # order by
        # limit offset top
{panel}

  was:
in hadoop ,hdfs is responsible for store , mapreduce is responsible for compute 
.
my idea is that data are stored in distributed database , data compute is like 
mapreduce.

h2. how to do ?

* insert: 
using two-phase commit ,according to the split policy ,just execute insert in 
nodes

* delete: 
using two-phase commit ,according to the split policy ,just execute delete in 
nodes

* update:
using two-phase commit, according to the split policy, if record node does not 
change ,just execute update in nodes, if record node change, first delete old 
value in source node , and insert new value in destination node .
* select:
** simple select (like data just in one node , or data fusion across multi 
nodes not need)is just the same like standalone database server;
** complex select (like distinct , group by, order by, sub query, join across 
multi nodes),we call a job 
{panel}
{color:red}job are parsed into stages , stages have lineage , all stages in a 
job make up dag( Directed Acyclic Graph ) ,every stage contains mapsql 
,shuffle, reducesql .
when receive request sql, according to metadata ,generate the execution plan 
which contain the dag , including stage and mapsql ,shuffle, reducesql in each 
stage; then just execute the plan , and return result to client.

as in spark , it is the same ; rdd is table , job is job;
as mapreduce in hadoop, it is the same ; mapsql is map , shuffle is shuffle , 
reducesql is reduce.
{color}
{panel}


h2. architecture:
!http://images2015.cnblogs.com/blog/439702/201606/439702-20160621124133334-32823985.png!

* client : user interface 
* master : master like nameserver in hdfs
* meta database : contain the base information about system , nodes , tables 
and so on 
* store node :  database node where data is stored , insert,delete,update is 
always executed on 
* calculate node ： where execure select , source data is in store nodes , then 
other task is run on calculate node . calculae node may be the same as store 
node in practice


h2. example:

{panel}
select age,count(u_id) v from tab_user_info t where u_reg_dt>=? and u_reg_dt<=? 
group by age


execution plan may be:
stage0:
        mapsql:
                select age,count(u_id) v from tab_user_info t where u_reg_dt>=? 
and u_reg_dt<=? group by age

        shuffle: 
                shuffle by age with range policy, 
                for example ,if number of reduce node is N , then every node 
has (max(u_id)-min(u_id))/N record , 
                reduce node have id , node with small id store data with small 
range of age , so we can group by in each node
                
        reducesql: 
                select age,sum(v) from t where group by age

note:
we must execute group by on reduce node because of data coming from different 
mapsql need to be aggregated



{panel}


{panel}

select t1.u_id,t1.u_name,t2.login_product 
from tab_user_info t1 join tab_login_info t2 
on (t1.u_id=t2.u_id and t1.u_reg_dt>=? and t1.u_reg_dt<=?)

execution plan may be:

stage0:
        mapsql:
                select u_id,u_name from tab_user_info t where u_reg_dt>=? and 
t1.u_reg_dt<=? ;
                select u_id, login_product from tab_login_info t ;


        shuffle: 
                shuffle by u_id with range policy, 
                for example ,if number of reduce node is N , then every node 
has (max(u_id)-min(u_id))/N record , 
                reduce node have id , node with small id store data with small 
range of u_id , so we can join in each node
                
        reducesql: 
                select t1.u_id,t1.u_name,t2.login_product 
                from tab_user_info t1 join tab_login_info t2 
                on (t1.u_id=t2.u_id)


note:
because of join ,each table need to be tagged so that reduce can determine each 
record belongs to which table


{panel}


> distributed database for store , mapreduce for compute
> ------------------------------------------------------
>
>                 Key: HADOOP-13304
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13304
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 2.6.4
>            Reporter: jiang hehui
>
> in hadoop ,hdfs is responsible for store , mapreduce is responsible for 
> compute .
> my idea is that data are stored in distributed database , data compute is 
> like mapreduce.
> h2. how to do ?
> * insert: 
> using two-phase commit ,according to the split policy ,just execute insert in 
> nodes
> * delete: 
> using two-phase commit ,according to the split policy ,just execute delete in 
> nodes
> * update:
> using two-phase commit, according to the split policy, if record node does 
> not change ,just execute update in nodes, if record node change, first delete 
> old value in source node , and insert new value in destination node .
> * select:
> ** simple select (like data just in one node , or data fusion across multi 
> nodes not need)is just the same like standalone database server;
> ** complex select (like distinct , group by, order by, sub query, join across 
> multi nodes),we call a job 
> {panel}
> {color:red}job are parsed into stages , stages have lineage , all stages in a 
> job make up dag( Directed Acyclic Graph ) ,every stage contains mapsql 
> ,shuffle, reducesql .
> when receive request sql, according to metadata ,generate the execution plan 
> which contain the dag , including stage and mapsql ,shuffle, reducesql in 
> each stage; then just execute the plan , and return result to client.
> as in spark , it is the same ; rdd is table , job is job;
> as mapreduce in hadoop, it is the same ; mapsql is map , shuffle is shuffle , 
> reducesql is reduce.
> {color}
> {panel}
> h2. architecture:
> !http://images2015.cnblogs.com/blog/439702/201606/439702-20160621124133334-32823985.png!
> * client : user interface 
> * master : master like nameserver in hdfs
> * meta database : contain the base information about system , nodes , tables 
> and so on 
> * store node :  database node where data is stored , insert,delete,update is 
> always executed on 
> * calculate node ： where execure select , source data is in store nodes , 
> then other task is run on calculate node . calculae node may be the same as 
> store node in practice
> h2. Feature & Advantage
> {panel}
> {color:green}
> data is stored in some nodes , split by field values ,with one policy;
> this feature is very useful in full-text indexing (like solr ,elastic search);
> {color}
> {panel}
> {panel}
> {color:green}
> contrast with hdfs , data location can be get in your mind ,not get by 
> execute some command;
> {color}
> {panel}
> {panel}
> {color:green}
> when insert, update, delete multi record, xa transaction can se used to 
> support consistence;
> {color}
> {panel}
> {panel}
> {color:green}
> we know that random read/write is not supported in hdfs, so update, delete is 
> difficult , and insert are batch normally;
> {color}
> {panel}
> {panel}
> {color:green}
> so data store in database have big advantage
> sql across multi nodes are supported , including group by , order by , 
> having, specially sub query and join
> data is stored in database ,so index can speed our query and data can be 
> cached in memory automatically ;
> {color}
> {panel}
> {panel}
> {color:green}
> we can get a little records from billon-level records very quickly using 
> index and cache, when using hadoop is very slow.
> when using hadoop, data from online database to offline data warehouse is 
> hard for update and delete, and delayed because of data merge;
> {color}
> {panel}
> {panel}
> {color:green}
> if we use database to store data , data sync is very simple and real-time, 
> just use replication , all issues are resolved
> you can see, online and offline can both use this system , 
> refer to application architecture(online & offline)
> {color}
> {panel}
> h2. example (about group by):
> {panel}
> sql :
> select age,count(u_id) v from tab_user_info t where u_reg_dt>=? and 
> u_reg_dt<=? group by age
> execution plan may be:
> stage0:
> {quote}
>       mapsql:
>               select age,count(u_id) v from tab_user_info t where u_reg_dt>=? 
> and u_reg_dt<=? group by age
>       shuffle: 
>               shuffle by age with range policy, 
>               for example ,if number of reduce node is N , then every node 
> has (max(u_id)-min(u_id))/N record , 
>               reduce node have id , node with small id store data with small 
> range of age , so we can group by in each node
>               
>       reducesql: 
>               select age,sum(v) from t where group by age
> {quote}
> note:
> we must execute group by on reduce node because of data coming from different 
> mapsql need to be aggregated
> {panel}
> h2. example (about join):
> {panel}
> sql:
> select t1.u_id,t1.u_name,t2.login_product 
> from tab_user_info t1 join tab_login_info t2 
> on (t1.u_id=t2.u_id and t1.u_reg_dt>=? and t1.u_reg_dt<=?)
> execution plan may be:
> stage0:
> {quote}
>       mapsql:
>               select u_id,u_name from tab_user_info t where u_reg_dt>=? and 
> t1.u_reg_dt<=? ;
>               select u_id, login_product from tab_login_info t ;
>       shuffle: 
>               shuffle by u_id with range policy, 
>               for example ,if number of reduce node is N , then every node 
> has (max(u_id)-min(u_id))/N record , 
>               reduce node have id , node with small id store data with small 
> range of u_id , so we can join in each node
>               
>       reducesql: 
>               select t1.u_id,t1.u_name,t2.login_product 
>               from tab_user_info t1 join tab_login_info t2 
>               on (t1.u_id=t2.u_id)
> {quote}
> note:
> because of join ,each table need to be tagged so that reduce can determine 
> each record belongs to which table
> {panel}
> h2. example (about group by & order by & join):
> {panel}
> sql:
> select t1.id1 , t2.id1 , sum(t1.c1) v1 ,sum(t2.c1) v2 from t1 join t2 on 
> (t1.cc = t2.cc and t1.cc1=? and t2.cc2=?) group by t1.id1 , t2.id1 order by 
> v1,v2
> this job need multi stage :
>       * first execute join 
>       * base on first result , execute group by
>       * base on second result , execute order by
> {panel}
> h2.  how to execute any sql ?
> {panel}
> first parse the subquery and union , each part is a select (not subquery and 
> union) in the tree;
> for each part, parse and execute order may be as below:
>       # from (include join ,multi table)
>       # join condition & where 
>       # group by (check result columns)
>       # having after group by if exist
>       # result column(include distinct)
>       # order by
>       # limit offset top
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-13304) distributed database for store , mapreduce for compute

Reply via email to