[jira] [Commented] (HBASE-13153) enable bulkload to support replication

Ashish Singhi (JIRA) Mon, 31 Aug 2015 04:24:13 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723309#comment-14723309
 ]


Ashish Singhi commented on HBASE-13153:
---------------------------------------

[~lhofhansl], thanks for the review and comments.

bq. What if that notification is missed? For example the RS dies just then? WAL 
replication does not have this issue since it always deals with all existing 
WALs so it cannot miss anything.
After loading the hfile successfully, we will notify and then return to the 
client. So if RS dies then complete bulk load will fail and client has to retry.

bq. So you'll send the HFile over RPCs? These files can be huge. Can we use 
HDFS' distCP here?
No, we will send only the path of HFiles in the source cluster.

bq. Can we simply use the standard bulk load mechanism here? It would split the 
files as necessary.
Yes, we plan to use complete bulk load tool mechanism where in peer cluster 
will act as complete bulk load client.

bq. You'll need to ensure this somehow.
Plan is we will have our own implementation of 
BaseLogCleanerDelegate#getDeletableFiles to ensure this.

bq. That can lead to very tricky issues where the same files just go from 
cluster to cluster in a never ending cycle. We know at the source that the 
HFiles came from a bulk load, maybe we can handle that specially.
Cyclic replication is a limitation as of now. We are still thinking how we can 
handle this.

bq. Lastly, it might be generally a good option to copy HFiles around, rather 
than WALs (at least for some setups). Could we use this to do that?
The design will support this. Currently we are adding this in to bulk load. If 
required we can extend this hook.

> enable bulkload to support replication
> --------------------------------------
>
>                 Key: HBASE-13153
>                 URL: https://issues.apache.org/jira/browse/HBASE-13153
>             Project: HBase
>          Issue Type: New Feature
>          Components: Replication
>            Reporter: sunhaitao
>            Assignee: Ashish Singhi
>             Fix For: 2.0.0
>
>         Attachments: HBase Bulk Load Replication.pdf
>
>
> Currently we plan to use HBase Replication feature to deal with disaster 
> tolerance scenario.But we encounter an issue that we will use bulkload very 
> frequently,because bulkload bypass write path, and will not generate WAL, so 
> the data will not be replicated to backup cluster. It's inappropriate to 
> bukload twice both on active cluster and backup cluster. So i advise do some 
> modification to bulkload feature to enable bukload to both active cluster and 
> backup cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13153) enable bulkload to support replication

Reply via email to