[ 
https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13153:
---------------------------
    Release Note: 
This enhances the HBase replication to support replication of bulk loaded data. 
This is configurable, by default it is set to false which means it will not 
replicate the bulk loaded data to its peer(s). To enable it set 
"hbase.replication.bulkload.enabled" to true.

Following are the additional configurations added for this enhancement,
 a. hbase.replication.cluster.id - This is manadatory to configure in cluster 
where replication for bulk loaded data is enabled. A source cluster is uniquely 
identified by sink cluster using this id. This should be configured in the 
source cluster configuration file for all the RS.
 b. hbase.replication.conf.dir - This represents the directory where all the 
active cluster's file system client configurations are defined in subfolders 
corresponding to their respective replication cluster id in peer cluster. This 
should be configured in the peer cluster configuration file for all the RS. 
Default is HBASE_CONF_DIR.
 c. hbase.replication.source.fs.conf.provider - This represents the class which 
provides the source cluster file system client configuration to peer cluster. 
This should be configured in the peer cluster configuration file for all the 
RS. Default is 
org.apache.hadoop.hbase.replication.regionserver.DefaultSourceFSConfigurationProvider

 For example: If source cluster FS client configurations are copied in peer 
cluster under directory /home/user/dc1/ then  hbase.replication.cluster.id 
should be configured as dc1 and hbase.replication.conf.dir as /home/user

Note: 
 a. Any modification to source cluster FS client configuration files in peer 
cluster side replication configuration directory then it needs to restart all 
its peer(s) cluster RS with default hbase.replication.source.fs.conf.provider.
 b. Only 'xml' type files will be loaded by the default 
hbase.replication.source.fs.conf.provider.

As part of this we have made following changes to LoadIncrementalHFiles class 
which is marked as Public and Stable class,
 a. Raised the visibility scope of LoadQueueItem class from package private to 
public.
 b. Added a new method loadHFileQueue, which loads the queue of LoadQueueItem 
into the table as per the region keys provided.

  was:
This enhances the HBase replication to support replication of bulk loaded data. 
This is configurable, by default it is set to false which means it will not 
replicate the bulk loaded data to its peer(s). To enable it set 
"hbase.replication.bulkload.enabled" to true.

Following are the additional configurations are added for this enhacement,
 a. hbase.replication.cluster.id - This is manadatory to configure in cluster 
where replication for bulk loaded data is enabled. A source cluster is uniquely 
identified by sink cluster using this id. This should be configured in the 
source cluster configuration file for all the RS.
 b. hbase.replication.conf.dir - This represents the directory where all the 
active cluster's file system client configurations are defined in their 
respective replication cluster id subfolders in peer cluster. This should be 
configured in the peer cluster configuration file for all the RS. Default 
HBASE_CONF_DIR.
 c. hbase.replication.source.fs.conf.provider - This represents the class which 
provides the source cluster file system client configuration to peer cluster. 
This should be configured in the peer cluster configuration file for all the 
RS. Default 
org.apache.hadoop.hbase.replication.regionserver.DefaultSourceFSConfigurationProvider

 For example: If source cluster FS client configurations are copied in peer 
cluster under directory /home/user/dc1/ then  hbase.replication.cluster.id 
should be configured as dc1 and hbase.replication.conf.dir as /home/user

Note: 
 a. Any modification in source cluster FS client configuration files in peer 
cluster side replication configuration directory then it needs to restart all 
its peer(s) cluster RS with default hbase.replication.source.fs.conf.provider.
 b. Only 'xml' type files will be loaded by the default 
hbase.replication.source.fs.conf.provider.

As part of this we made have made following changes to LoadIncrementalHFiles 
class which is marked as Public and Stable class,
 a. Raised the visbility scope of LoadQueueItem class from package private to 
public.
 b. Added a new method loadHFileQueue, which loads the queue of LoadQueueItem 
into the table as per the region keys provided.


> Bulk Loaded HFile Replication
> -----------------------------
>
>                 Key: HBASE-13153
>                 URL: https://issues.apache.org/jira/browse/HBASE-13153
>             Project: HBase
>          Issue Type: New Feature
>          Components: Replication
>            Reporter: sunhaitao
>            Assignee: Ashish Singhi
>             Fix For: 2.0.0
>
>         Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch, 
> HBASE-13153-v11.patch, HBASE-13153-v12.patch, HBASE-13153-v13.patch, 
> HBASE-13153-v14.patch, HBASE-13153-v15.patch, HBASE-13153-v16.patch, 
> HBASE-13153-v2.patch, HBASE-13153-v3.patch, HBASE-13153-v4.patch, 
> HBASE-13153-v5.patch, HBASE-13153-v6.patch, HBASE-13153-v7.patch, 
> HBASE-13153-v8.patch, HBASE-13153-v9.patch, HBASE-13153.patch, HBase Bulk 
> Load Replication-v1-1.pdf, HBase Bulk Load Replication-v2.pdf, HBase Bulk 
> Load Replication-v3.pdf, HBase Bulk Load Replication.pdf, HDFS_HA_Solution.PNG
>
>
> Currently we plan to use HBase Replication feature to deal with disaster 
> tolerance scenario.But we encounter an issue that we will use bulkload very 
> frequently,because bulkload bypass write path, and will not generate WAL, so 
> the data will not be replicated to backup cluster. It's inappropriate to 
> bukload twice both on active cluster and backup cluster. So i advise do some 
> modification to bulkload feature to enable bukload to both active cluster and 
> backup cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to