[ 
https://issues.apache.org/jira/browse/HBASE-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593430#comment-13593430
 ] 

Matteo Bertozzi commented on HBASE-7987:
----------------------------------------

We have already versioning for snapshots in the SnapshotDescription. Jesse 
added it, i think when we have discussed this "single" manifest format.

In terms of "it will give us a clearer view of the snapshot" it depends on your 
tastes :)
currently, you can just do a ls -lR /hbase/.snapshot and you know what is in 
the snapshot... with the manifests you can't do that, you should use the 
SnapshotInfo tool... so, it's a trade off... the manifest it may be better 
because more is compact and you can extend it with more metadata... but if you 
don't have a tool you can't see it... and if you corrupt the manifest file you 
have lost all your snapshot, while with the current one you can restore with 
losing just some data (as today when hbck removes one hfile)

for me, this is a "next gen" format, not a urgent fix because the current one 
is broken.

{quote}There may be the empty files produced by current implementation of 
snapshots.{quote}
Not sure what you mean here...
                
> Snapshot Manifest file instead of multiple empty files
> ------------------------------------------------------
>
>                 Key: HBASE-7987
>                 URL: https://issues.apache.org/jira/browse/HBASE-7987
>             Project: HBase
>          Issue Type: Improvement
>          Components: snapshots
>            Reporter: Matteo Bertozzi
>
> Currently taking a snapshot means creating one empty file for each file in 
> the source table directory, plus copying the .regioninfo file for each 
> region, the table descriptor file and a snapshotInfo file.
> during the restore or snapshot verification we traverse the filesystem 
> (fs.listStatus()) to find the snapshot files, and we open the .regioninfo 
> files to get the information.
> to avoid hammering the NameNode and having lots of empty files, we can use a 
> manifest file that contains the list of files and information that we need.
> To keep the RS parallelism that we have, each RS can write its own manifest.
> {code}
> message SnapshotDescriptor {
>   required string name;
>   optional string table;
>   optional int64 creationTime;
>   optional Type type;
>   optional int32 version;
> }
> message SnapshotRegionManifest {
>   optional int32 version;
>   required RegionInfo regionInfo;
>   repeated FamilyFiles familyFiles;
>   message StoreFile {
>     required string name;
>     optional Reference reference;
>   }
>   message FamilyFiles {
>     required bytes familyName;
>     repeated StoreFile storeFiles;
>   }
> }
> {code}
> {code}
> /hbase/.snapshot/<snapshotName>
> /hbase/.snapshot/<snapshotName>/snapshotInfo
> /hbase/.snapshot/<snapshotName>/<tableName>
> /hbase/.snapshot/<snapshotName>/<tableName>/tableInfo
> /hbase/.snapshot/<snapshotName>/<tableName>/regionManifest(.n)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to