On Tue, Aug 7, 2012 at 3:36 AM, Wessel Dankers
<wsl-backuppc-de...@fruit.je> wrote:
>
>>
>> Personally I think the hardlink scheme works pretty well up to about
>> the scale that I'd want on a single machine and you get a badly needed
>> atomic operation with links more or less for free.
>
> Adding a chunk can be done atomically using a simple rename(). Removing a
> chunk can be done atomically using unlink().

That gets you a chunk of data with some random name on disk.   You
also have to maintain a list of such chunks that are relevant to a
particular file and a reference count, and you have to do that
atomically without any common tool that provides atomic operations.
You get the reference and count for free with a filesystem hardlink.

> The only danger lies in
> removing a chunk that is still being used (because there's a backup still
> in progress whose chunks aren't being counted yet by the gc procedure). The
> simplest way to prevent that is to grant exclusive access to the gc
> process. Note that hardlinks do not prevent this race either. It's a
> problem we need to solve anyway.

Hardlinks maintain the reference count atomically.  The equivalent
problem remaining there is that there in no single atomic operation to
check that the link count is one and remove it.

> Craig lists a couple of reasons to abandon the hardlink scheme in
> http://sourceforge.net/mailarchive/message.php?msg_id=27140176

And yet, he hasn't released something better.  It will be at least
non-trivial...   And you'll need a tool equivalent to fsck to check
for corruption of your references and counts.

>> If you are going to do things differently, wouldn't it make sense to use
>> one of the naturally distributed scalable databases (bigcouch, riak,
>> etc.) for storage from the start since anything you do is going to
>> involve re-inventing the atomic operation of updating a link or replacing
>> it and the big win would be making this permit concurrent writes from
>> multiple servers?
>
> Using something like ceph/rados for storing the chunks could be interesting
> at some point but I'd like to keep things simple for now.

For me, 'simple' means building something on top of the best work
someone else has already done.  For a single machine/filesystem you
might as well just use a compressing, de-dup'ing filesystem and ignore
the whole chunking/linking mess.   And in fact, I wonder how it would
work to just remove those operations from backuppc and leave the rest
as is.    But, I still think the next level in scaling would be
easiest on top of a naturally distributed and replicated database.

-- 
   Les Mikesell
     lesmikes...@gmail.com

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
BackupPC-devel mailing list
BackupPC-devel@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-devel
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to