[
https://issues.apache.org/jira/browse/HBASE-21642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zheng Hu updated HBASE-21642:
-----------------------------
Status: Patch Available (was: Open)
> CopyTable by reading snapshot and bulkloading will save a lot of time.
> ----------------------------------------------------------------------
>
> Key: HBASE-21642
> URL: https://issues.apache.org/jira/browse/HBASE-21642
> Project: HBase
> Issue Type: Improvement
> Reporter: Zheng Hu
> Assignee: Zheng Hu
> Priority: Major
> Attachments: HBASE-21642.v1.patch
>
>
> In our HBase clusters, some users has the need to merge two diff table's
> data into one. Currently , the CopyTable will scan the source table , and
> put mutations into destination table.
> Although CopyTable with bulkload can speed a lot (compared to CopyTable with
> scan and put), it still take lots of time to scan the source table. and the
> worst thing is: CopyTable with scan table will impact the cluster's
> availablity, it cost lots of resource in RS to scanning, the cpu, memory,
> gc stw, rs handlers, disk io, network io ... etc. All those things will
> affect the availablity.
> So in our clusters, we tried to do all scanning job by using scan snapshot
> instead of scan table. it at least isolate the cpu & memory & gc resource
> between the online RS and scanning job. What's more, the snapshot scanning
> is much faster than scaning RS, and it's more stable.
> So, here, I'll make the copy table tool support snapshot scanning.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)