[
https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Yu updated HBASE-13153:
---------------------------
Release Note:
This enhances the HBase replication to support replication of bulk loaded data.
This is configurable, by default it is set to false which means it will not
replicate the bulk loaded data to its peer(s). To enable it set
"hbase.replication.bulkload.enabled" to true.
Following are the additional configurations added for this enhancement,
a. hbase.replication.cluster.id - This is manadatory to configure in cluster
where replication for bulk loaded data is enabled. A source cluster is uniquely
identified by sink cluster using this id. This should be configured in the
source cluster configuration file for all the RS.
b. hbase.replication.conf.dir - This represents the directory where all the
active cluster's file system client configurations are defined in subfolders
corresponding to their respective replication cluster id in peer cluster. This
should be configured in the peer cluster configuration file for all the RS.
Default is HBASE_CONF_DIR.
c. hbase.replication.source.fs.conf.provider - This represents the class which
provides the source cluster file system client configuration to peer cluster.
This should be configured in the peer cluster configuration file for all the
RS. Default is
org.apache.hadoop.hbase.replication.regionserver.DefaultSourceFSConfigurationProvider
For example: If source cluster FS client configurations are copied in peer
cluster under directory /home/user/dc1/ then hbase.replication.cluster.id
should be configured as dc1 and hbase.replication.conf.dir as /home/user
Note:
a. Any modification to source cluster FS client configuration files in peer
cluster side replication configuration directory then it needs to restart all
its peer(s) cluster RS with default hbase.replication.source.fs.conf.provider.
b. Only 'xml' type files will be loaded by the default
hbase.replication.source.fs.conf.provider.
As part of this we have made following changes to LoadIncrementalHFiles class
which is marked as Public and Stable class,
a. Raised the visibility scope of LoadQueueItem class from package private to
public.
b. Added a new method loadHFileQueue, which loads the queue of LoadQueueItem
into the table as per the region keys provided.
was:
This enhances the HBase replication to support replication of bulk loaded data.
This is configurable, by default it is set to false which means it will not
replicate the bulk loaded data to its peer(s). To enable it set
"hbase.replication.bulkload.enabled" to true.
Following are the additional configurations are added for this enhacement,
a. hbase.replication.cluster.id - This is manadatory to configure in cluster
where replication for bulk loaded data is enabled. A source cluster is uniquely
identified by sink cluster using this id. This should be configured in the
source cluster configuration file for all the RS.
b. hbase.replication.conf.dir - This represents the directory where all the
active cluster's file system client configurations are defined in their
respective replication cluster id subfolders in peer cluster. This should be
configured in the peer cluster configuration file for all the RS. Default
HBASE_CONF_DIR.
c. hbase.replication.source.fs.conf.provider - This represents the class which
provides the source cluster file system client configuration to peer cluster.
This should be configured in the peer cluster configuration file for all the
RS. Default
org.apache.hadoop.hbase.replication.regionserver.DefaultSourceFSConfigurationProvider
For example: If source cluster FS client configurations are copied in peer
cluster under directory /home/user/dc1/ then hbase.replication.cluster.id
should be configured as dc1 and hbase.replication.conf.dir as /home/user
Note:
a. Any modification in source cluster FS client configuration files in peer
cluster side replication configuration directory then it needs to restart all
its peer(s) cluster RS with default hbase.replication.source.fs.conf.provider.
b. Only 'xml' type files will be loaded by the default
hbase.replication.source.fs.conf.provider.
As part of this we made have made following changes to LoadIncrementalHFiles
class which is marked as Public and Stable class,
a. Raised the visbility scope of LoadQueueItem class from package private to
public.
b. Added a new method loadHFileQueue, which loads the queue of LoadQueueItem
into the table as per the region keys provided.
> Bulk Loaded HFile Replication
> -----------------------------
>
> Key: HBASE-13153
> URL: https://issues.apache.org/jira/browse/HBASE-13153
> Project: HBase
> Issue Type: New Feature
> Components: Replication
> Reporter: sunhaitao
> Assignee: Ashish Singhi
> Fix For: 2.0.0
>
> Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch,
> HBASE-13153-v11.patch, HBASE-13153-v12.patch, HBASE-13153-v13.patch,
> HBASE-13153-v14.patch, HBASE-13153-v15.patch, HBASE-13153-v16.patch,
> HBASE-13153-v2.patch, HBASE-13153-v3.patch, HBASE-13153-v4.patch,
> HBASE-13153-v5.patch, HBASE-13153-v6.patch, HBASE-13153-v7.patch,
> HBASE-13153-v8.patch, HBASE-13153-v9.patch, HBASE-13153.patch, HBase Bulk
> Load Replication-v1-1.pdf, HBase Bulk Load Replication-v2.pdf, HBase Bulk
> Load Replication-v3.pdf, HBase Bulk Load Replication.pdf, HDFS_HA_Solution.PNG
>
>
> Currently we plan to use HBase Replication feature to deal with disaster
> tolerance scenario.But we encounter an issue that we will use bulkload very
> frequently,because bulkload bypass write path, and will not generate WAL, so
> the data will not be replicated to backup cluster. It's inappropriate to
> bukload twice both on active cluster and backup cluster. So i advise do some
> modification to bulkload feature to enable bukload to both active cluster and
> backup cluster
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)