Re: Metadata snapshot issues

Greg Freemyer Fri, 23 Dec 2011 06:09:52 -0800

Swapnil Gaikwad <[email protected]> wrote:

>If we gives new inodes to each file during metadata snapshot. Then is
>there any conflicting issue happens?
>What are the techniques help us in this?
>Is some-one have source code of it?
>


Swapnil,

There are at least 3 different solutions to filesystem snapshots in the linux 
kernel:

- device mapper snapshots
- btrfs snapshots
- next3 / next4 snapshots

The first 2 are in the vanilla kernel, the last is available as a patch that 
builds a module.  You can get the source at the link I posted before.

Device mapper is the simplest to understand and test.  If you don't understand 
it's copy on write (COW) technology, please study it first.  It is the least 
efficient because a write to a virgin data block causes the old data block to 
be read from the primary volume, then written to the snapshot volume, then 
snapshot pointers are updated, then the new data is written to the primary 
volume.

Device mapper snapshots have no filesystem knowledge, so inodes are handled 
exactly like any other volume block.  COW is a standard technology that is 
implemented in lots of external storage solutions as well.  Ie NAS / SAN 
devices often offer snapshots and most of those use copy on write solutions.

My first exposure to snapshots maintained by the filesystem itself without the 
use of COW was with windows server 2003.  It came with "shadow copy" 
technology.  Since the filesystem knows the details of what's really happening 
with the data, it can be more efficient.  MS allocates a new $MFT record (like 
a inode) when a file is updated.  Any replaced %mft records, pointer blocks, 
data blocks are left in place physically, but logically moved to a large file 
that holds the "shadow copy".

So think about a database file that has 1% of the data replaced by overwriting. 
 The filesystem allocates a new $mft record and new data blocks for the 1% of 
new data.  The old (original) $mft and data blocks are reallocated to the 
single large shadow copy file.  (Note, if there are 5 simultaneous shadow 
copies, then there are 5 of these shadow copy files, but only the most recent 
is active and all newly replaced data blocks are logically moved to just that 
one large shadow copy file.)

So if you read a old version of a file like a database that is spread across 
the snapshots, you get the $mft record from the oldest snapshot file.  It will 
have references to physical blocks for where the data is.  Since the data 
blocks were never moved, those are all still valid.  The blocks pointed at will 
be spread across the various shadow copy files and the live/active data blocks.

When the oldest snapshot is deleted, it is a simple matter of deleting that 
single shadow copy file.

Note how efficient the above is.  There is very little extra disk write 
activity involved, and maintaining 20 shadow copies is no more overhead than 
maintaining one.  (Assuming you have plenty of disk space).

I don't know if next3/next4 and btrfs use similar solutions.  Ie. I haven't 
read their design docs.

Greg



-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

_______________________________________________
Kernelnewbies mailing list
[email protected]
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

Re: Metadata snapshot issues

Reply via email to