[ 
https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145814#comment-16145814
 ] 

huaxiang sun commented on HBASE-18693:
--------------------------------------

Hi [[email protected]],

{quote}
My concern is if we restore a snapshot twice which is possible, how to
handle such operations?
{quote}

The snapshot itself is not destroyed after moving mob files from archive to 
working directory. I do not see an issue to restore a snapshot twice here. Can 
you share more details?

{quote}
Or we can skip the hfile links in most of MOB compaction, and compact the
links in a longer interval (like a month)?
{quote}

For one of our use cases, user exported a snapshot with millions of mob files 
and restored the table at a remote cluster. The select() took more than one day 
to complete before actual compaction happened. We did the hack to skip hfile 
links so compaction could happen within several minutes. Even compacting links 
in a longer interval, this is still a huge overhead. What do you think? 

Thanks. 

> adding an option to restore_snapshot to move mob files from archive dir to 
> working dir
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-18693
>                 URL: https://issues.apache.org/jira/browse/HBASE-18693
>             Project: HBase
>          Issue Type: Improvement
>          Components: mob
>    Affects Versions: 2.0.0-alpha-2
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>
> Today, there is a single mob region where mob files for all user regions are 
> saved. There could be many files (one million) in a single mob directory. 
> When one mob table is restored or cloned from snapshot, links are created for 
> these mob files. This creates a scaling issue for mob compaction. In mob 
> compaction's select() logic, for each hFileLink, it needs to call NN's 
> getFileStatus() to get the size of the linked hfile. Assume that one such 
> call takes 20ms, 20ms * 1000000 = 6 hours. 
> To avoid this overhead, we want to add an option so that restore_snapshot can 
> move mob files from archive dir to working dir. clone_snapshot is more 
> complicated as it can clone a snapshot to a different table so moving that 
> can destroy the snapshot. No option will be added for clone_snapshot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to