[ 
https://issues.apache.org/jira/browse/HBASE-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614671#comment-13614671
 ] 

Matteo Bertozzi commented on HBASE-7987:
----------------------------------------

The snapshot viewer, cleaners and export should be fine, since they were using 
the SnapshotReferenceUtil that allows you iterate through the snapshot files... 
so if you change the format is just a problem of SnapshotReferenceUtil on how 
to parse it, but there's lots of stuff that relies on path (e.g. tests)
but I guess converting everything to the snapshotReferenceUtil is in the scope 
of this jira.

anyway, once converted... 
I guess we can add the compatibility with the old format. And at this point it 
should be just matter of adding an if snapshotInfo.getVersion and iterate over 
the fs instead of the manifest files.

for rolling upgrades, aside the fact that moving from 94 to 95 is not 
supported. We can just add to HRegion.addRegionToSnapshot() the same if 
snapshotInfo.getVersion() and use the multi file format as default (the master 
produce the SnapshotInfo with version 1)...

also, I'd like to have a benchmark to see how much faster is this one  (in 
theory you've fewer roundtrip the NN)

but what you think about the manifest format?
Do you think that is the right way to do it?
is there something else that we should consider putting in, or removing?
any other thoughts?
                
> Snapshot Manifest file instead of multiple empty files
> ------------------------------------------------------
>
>                 Key: HBASE-7987
>                 URL: https://issues.apache.org/jira/browse/HBASE-7987
>             Project: HBase
>          Issue Type: Improvement
>          Components: snapshots
>            Reporter: Matteo Bertozzi
>         Attachments: HBASE-7987.sketch, HBASE-7987-v2.sketch
>
>
> Currently taking a snapshot means creating one empty file for each file in 
> the source table directory, plus copying the .regioninfo file for each 
> region, the table descriptor file and a snapshotInfo file.
> during the restore or snapshot verification we traverse the filesystem 
> (fs.listStatus()) to find the snapshot files, and we open the .regioninfo 
> files to get the information.
> to avoid hammering the NameNode and having lots of empty files, we can use a 
> manifest file that contains the list of files and information that we need.
> To keep the RS parallelism that we have, each RS can write its own manifest.
> {code}
> message SnapshotDescriptor {
>   required string name;
>   optional string table;
>   optional int64 creationTime;
>   optional Type type;
>   optional int32 version;
> }
> message SnapshotRegionManifest {
>   optional int32 version;
>   required RegionInfo regionInfo;
>   repeated FamilyFiles familyFiles;
>   message StoreFile {
>     required string name;
>     optional Reference reference;
>   }
>   message FamilyFiles {
>     required bytes familyName;
>     repeated StoreFile storeFiles;
>   }
> }
> {code}
> {code}
> /hbase/.snapshot/<snapshotName>
> /hbase/.snapshot/<snapshotName>/snapshotInfo
> /hbase/.snapshot/<snapshotName>/<tableName>
> /hbase/.snapshot/<snapshotName>/<tableName>/tableInfo
> /hbase/.snapshot/<snapshotName>/<tableName>/regionManifest(.n)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to