[ 
https://issues.apache.org/jira/browse/HBASE-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235425#comment-16235425
 ] 

Amit Kabra commented on HBASE-19104:
------------------------------------

Yes, time range is important but  there can be more important filtering we can 
do to restore data instead of restoring everything.

Most important being the tenant name or tenant id. So whenever restore is 
triggered, row is parsed / checked for the tenant id and if tenant id is found 
then only its restored in the restore table otherwise not.

Similarly , we should be able to pass particular backup directory in hdfs or 
particular hfile path to restore only that much data. This can help in 2 ways:
1) Debugging cases in production where we suspect issue with particular backup 
and we can restore that part only and check its validity instead of restoring 
everything.
2) Can help in doing backups self validation post backups (HBASE-19106) , where 
we can restore a part of it using these filters and validate backups.

When we get large amount of data in production these filters help a lot.

> Add filtering during restore in HBase backups.
> ----------------------------------------------
>
>                 Key: HBASE-19104
>                 URL: https://issues.apache.org/jira/browse/HBASE-19104
>             Project: HBase
>          Issue Type: New Feature
>          Components: backup&restore
>            Reporter: Amit Kabra
>            Priority: Major
>             Fix For: 2.1.0
>
>
> When we deal with large amount of data, it would be great , if we can do data 
> restore from backups based on tenant , based on time range , etc , so that if 
> finishes faster and we restore only what's required.
> Currently restore take backup id as input and restore all the data will that 
> backup id time stamp. We may not need to restore all data in a given backup 
> id.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to