[ 
https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147396#comment-16147396
 ] 

Jingcheng Du commented on HBASE-18693:
--------------------------------------

Thanks Huaxiang.
bq. The snapshot itself is not destroyed after moving mob files from archive to 
working directory. I do not see an issue to restore a snapshot twice here. Can 
you share more details?
Restoring a snapshot to the same table is okay. What if we try to restore the 
snapshot in another table? The same MOB file can be in different locations? No, 
right?

bq. For one of our use cases, user exported a snapshot with millions of mob 
files and restored the table at a remote cluster. The select() took more than 
one day to complete before actual compaction happened. We did the hack to skip 
hfile links so compaction could happen within several minutes. Even compacting 
links in a longer interval, this is still a huge overhead. What do you think?
You are right, this is a problem. How about select files with multiple threads, 
each thread handle part of the files selection? Thanks.

> adding an option to restore_snapshot to move mob files from archive dir to 
> working dir
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-18693
>                 URL: https://issues.apache.org/jira/browse/HBASE-18693
>             Project: HBase
>          Issue Type: Improvement
>          Components: mob
>    Affects Versions: 2.0.0-alpha-2
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>
> Today, there is a single mob region where mob files for all user regions are 
> saved. There could be many files (one million) in a single mob directory. 
> When one mob table is restored or cloned from snapshot, links are created for 
> these mob files. This creates a scaling issue for mob compaction. In mob 
> compaction's select() logic, for each hFileLink, it needs to call NN's 
> getFileStatus() to get the size of the linked hfile. Assume that one such 
> call takes 20ms, 20ms * 1000000 = 6 hours. 
> To avoid this overhead, we want to add an option so that restore_snapshot can 
> move mob files from archive dir to working dir. clone_snapshot is more 
> complicated as it can clone a snapshot to a different table so moving that 
> can destroy the snapshot. No option will be added for clone_snapshot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to