[
https://issues.apache.org/jira/browse/HBASE-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199512#comment-15199512
]
Jianwei Cui commented on HBASE-15469:
-------------------------------------
Upload the patch. In hbase shell, we scan specify families when taking snapshot
as:
{code}
hbase(main):004:0> snapshot 'test_table', 'test-snapshot', 'f1'
0 row(s) in 0.3830 seconds
{code}
And {{list_snapshots}} will show the table and families of the snapshot:
{code}
hbase(main):001:0> list_snapshots
SNAPSHOT TABLE/CFs + CREATION TIME
test-snapshot test_table/f1 (Thu Mar 17
20:54:22 +0800 2016)
1 row(s) in 0.2890 seconds
{code}
This snapshot could be operated by other operations, such as
{{clone_snapshot}}, {{restore_snapshot}}, etc.
> Take snapshot by family
> -----------------------
>
> Key: HBASE-15469
> URL: https://issues.apache.org/jira/browse/HBASE-15469
> Project: HBase
> Issue Type: Improvement
> Components: snapshots
> Affects Versions: 2.0.0
> Reporter: Jianwei Cui
> Attachments: HBASE-15469-v1.patch
>
>
> In our production environment, there are some 'wide' tables in offline
> cluster. The 'wide' table has a number of families, different applications
> will access different families of the table through MapReduce. When some
> application starting to provide online service, we need to copy needed
> families from offline cluster to online cluster. For future write, the
> inter-cluster replication supports setting families for table, we can use it
> to copy future edits for needed families. For existed data, we can take
> snapshot of the table on offline cluster, then exploit {{ExportSnapshot}} to
> copy snapshot to online cluster and clone the snapshot. However, we can only
> take snapshot for the whole table in which many families are not needed for
> the application, this will lead unnecessary data copy. I think it is useful
> to support taking snapshot by family, so that we can only copy needed data.
> Possible solution to support such function:
> 1. Add family names field to the protobuf definition of
> {{SnapshotDescription}}
> 2. Allow to set families when taking snapshot in hbase shell, such as:
> {code}
> snapshot 'tableName', 'snapshotName', 'FamilyA', 'FamilyB', {SKIP_FLUSH =>
> true}
> {code}
> 3. Add family names to {{SnapshotDescription}} in client side
> 4. Read family names from {{SnapshotDescription}} in Master/Regionserver,
> keep only requested families when taking snapshot for region.
> Discussions and suggestions are welcomed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)