[ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144865#comment-16144865 ]
Jingcheng Du commented on HBASE-18693: -------------------------------------- HDFS move doesn't copy the data, right, it doesn't, it is supposed to be a rename operation. My concern is if we restore a snapshot twice which is possible, how to handle such operations? In HBase, we compact the hfile links in compaction, I think compacting hfile links in MOB compaction is reasonable too. Or we can skip the hfile links in most of MOB compaction, and compact the links in a longer interval (like a month)? I prefer the 1st option. What's your idea? Thanks. > adding an option to restore_snapshot to move mob files from archive dir to > working dir > -------------------------------------------------------------------------------------- > > Key: HBASE-18693 > URL: https://issues.apache.org/jira/browse/HBASE-18693 > Project: HBase > Issue Type: Improvement > Components: mob > Affects Versions: 2.0.0-alpha-2 > Reporter: huaxiang sun > Assignee: huaxiang sun > > Today, there is a single mob region where mob files for all user regions are > saved. There could be many files (one million) in a single mob directory. > When one mob table is restored or cloned from snapshot, links are created for > these mob files. This creates a scaling issue for mob compaction. In mob > compaction's select() logic, for each hFileLink, it needs to call NN's > getFileStatus() to get the size of the linked hfile. Assume that one such > call takes 20ms, 20ms * 1000000 = 6 hours. > To avoid this overhead, we want to add an option so that restore_snapshot can > move mob files from archive dir to working dir. clone_snapshot is more > complicated as it can clone a snapshot to a different table so moving that > can destroy the snapshot. No option will be added for clone_snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)