[ 
https://issues.apache.org/jira/browse/HBASE-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593658#comment-13593658
 ] 

Ted Yu commented on HBASE-7987:
-------------------------------

I think the idea of manifest is applicable to online snapshot as well.
See the following code snippet from HRegion.addRegionToSnapshot():
{code}
    for (Store store : stores.values()) {
      // 2.1. build the snapshot reference directory for the store
      Path dstStoreDir = TakeSnapshotUtils.getStoreSnapshotDirectory(
        snapshotRegionFs.getRegionDir(), 
Bytes.toString(store.getFamily().getName()));
      List<StoreFile> storeFiles = new 
ArrayList<StoreFile>(store.getStorefiles());
      if (LOG.isDebugEnabled()) {
        LOG.debug("Adding snapshot references for " + storeFiles  + " hfiles");
      }

      // 2.2. iterate through all the store's files and create "references".
      int sz = storeFiles.size();
      for (int i = 0; i < sz; i++) {
        if (exnSnare != null) {
          exnSnare.rethrowException();
        }
        Path file = storeFiles.get(i).getPath();
        // create "reference" to this store file.  It is intentionally an empty 
file -- all
        // necessary infomration is captured by its fs location and filename.  
This allows us to
        // only figure out what needs to be done via a single nn operation 
(instead of having to
        // open and read the files as well).
        LOG.debug("Creating reference for file (" + (i+1) + "/" + sz + ") : " + 
file);
        Path referenceFile = new Path(dstStoreDir, file.getName());
        boolean success = fs.getFileSystem().createNewFile(referenceFile);
{code}
Looking at the last line, we create reference file for each store file.
I think manifest file should be used in the above case as well.
                
> Snapshot Manifest file instead of multiple empty files
> ------------------------------------------------------
>
>                 Key: HBASE-7987
>                 URL: https://issues.apache.org/jira/browse/HBASE-7987
>             Project: HBase
>          Issue Type: Improvement
>          Components: snapshots
>            Reporter: Matteo Bertozzi
>
> Currently taking a snapshot means creating one empty file for each file in 
> the source table directory, plus copying the .regioninfo file for each 
> region, the table descriptor file and a snapshotInfo file.
> during the restore or snapshot verification we traverse the filesystem 
> (fs.listStatus()) to find the snapshot files, and we open the .regioninfo 
> files to get the information.
> to avoid hammering the NameNode and having lots of empty files, we can use a 
> manifest file that contains the list of files and information that we need.
> To keep the RS parallelism that we have, each RS can write its own manifest.
> {code}
> message SnapshotDescriptor {
>   required string name;
>   optional string table;
>   optional int64 creationTime;
>   optional Type type;
>   optional int32 version;
> }
> message SnapshotRegionManifest {
>   optional int32 version;
>   required RegionInfo regionInfo;
>   repeated FamilyFiles familyFiles;
>   message StoreFile {
>     required string name;
>     optional Reference reference;
>   }
>   message FamilyFiles {
>     required bytes familyName;
>     repeated StoreFile storeFiles;
>   }
> }
> {code}
> {code}
> /hbase/.snapshot/<snapshotName>
> /hbase/.snapshot/<snapshotName>/snapshotInfo
> /hbase/.snapshot/<snapshotName>/<tableName>
> /hbase/.snapshot/<snapshotName>/<tableName>/tableInfo
> /hbase/.snapshot/<snapshotName>/<tableName>/regionManifest(.n)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to