[ 
https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147677#comment-16147677
 ] 

huaxiang sun commented on HBASE-18693:
--------------------------------------

Hi Jingcheng,

{quote}
Restoring a snapshot to the same table is okay. What if we try to restore the 
snapshot in another table? The same MOB file can be in different locations? No, 
right?
{quote}

I got what was your concern. restore_snapshot always restores to the same 
table, that is why I add an option here. clone_snapshot is a different story, 
it can be cloned to different tables. If the option is added to clone_snapshot, 
it will corrupt the snapshot.

{quote}
You are right, this is a problem. How about select files with multiple threads, 
each thread handle part of the files selection? Thanks.
{quote}
HBASE-17043 has been created for this effort. I think this is not enough and 
overhead (pressure to NN). We need to give user an option in this case.
If this option looks good to you, I am going to post a patch.

Thanks

> adding an option to restore_snapshot to move mob files from archive dir to 
> working dir
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-18693
>                 URL: https://issues.apache.org/jira/browse/HBASE-18693
>             Project: HBase
>          Issue Type: Improvement
>          Components: mob
>    Affects Versions: 2.0.0-alpha-2
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>
> Today, there is a single mob region where mob files for all user regions are 
> saved. There could be many files (one million) in a single mob directory. 
> When one mob table is restored or cloned from snapshot, links are created for 
> these mob files. This creates a scaling issue for mob compaction. In mob 
> compaction's select() logic, for each hFileLink, it needs to call NN's 
> getFileStatus() to get the size of the linked hfile. Assume that one such 
> call takes 20ms, 20ms * 1000000 = 6 hours. 
> To avoid this overhead, we want to add an option so that restore_snapshot can 
> move mob files from archive dir to working dir. clone_snapshot is more 
> complicated as it can clone a snapshot to a different table so moving that 
> can destroy the snapshot. No option will be added for clone_snapshot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to