[jira] [Updated] (HBASE-21642) CopyTable by reading snapshot and bulkloading will save a lot of time.

Zheng Hu (JIRA) Wed, 26 Dec 2018 03:34:14 -0800


     [ 
https://issues.apache.org/jira/browse/HBASE-21642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Zheng Hu updated HBASE-21642:
-----------------------------
    Status: Patch Available  (was: Open)

> CopyTable by reading snapshot and bulkloading will save a lot of time.
> ----------------------------------------------------------------------
>
>                 Key: HBASE-21642
>                 URL: https://issues.apache.org/jira/browse/HBASE-21642
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>         Attachments: HBASE-21642.v1.patch
>
>
> In our HBase clusters,  some users has the need to merge two diff table's 
> data into one.  Currently ,  the CopyTable will scan the source table , and 
> put mutations into destination table. 
> Although CopyTable with bulkload can speed a lot (compared to CopyTable with 
> scan and put), it still take lots of time to scan the source table.  and the 
> worst thing is:  CopyTable with scan table will impact the cluster's 
> availablity, it cost lots of resource in RS to scanning,  the cpu,  memory, 
> gc stw,  rs handlers, disk io, network io ... etc.  All those things will 
> affect the availablity. 
> So in our clusters,  we tried to do all scanning job by using scan snapshot 
> instead of scan table.  it at least isolate the cpu & memory & gc resource  
> between the online RS and scanning job. What's more,  the snapshot scanning 
> is much faster than scaning RS, and it's more stable.
> So, here,  I'll make the copy table tool support snapshot scanning. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21642) CopyTable by reading snapshot and bulkloading will save a lot of time.

Reply via email to