On Tue, Aug 7, 2012 at 3:36 AM, Wessel Dankers <wsl-backuppc-de...@fruit.je> wrote: > >> >> Personally I think the hardlink scheme works pretty well up to about >> the scale that I'd want on a single machine and you get a badly needed >> atomic operation with links more or less for free. > > Adding a chunk can be done atomically using a simple rename(). Removing a > chunk can be done atomically using unlink().
That gets you a chunk of data with some random name on disk. You also have to maintain a list of such chunks that are relevant to a particular file and a reference count, and you have to do that atomically without any common tool that provides atomic operations. You get the reference and count for free with a filesystem hardlink. > The only danger lies in > removing a chunk that is still being used (because there's a backup still > in progress whose chunks aren't being counted yet by the gc procedure). The > simplest way to prevent that is to grant exclusive access to the gc > process. Note that hardlinks do not prevent this race either. It's a > problem we need to solve anyway. Hardlinks maintain the reference count atomically. The equivalent problem remaining there is that there in no single atomic operation to check that the link count is one and remove it. > Craig lists a couple of reasons to abandon the hardlink scheme in > http://sourceforge.net/mailarchive/message.php?msg_id=27140176 And yet, he hasn't released something better. It will be at least non-trivial... And you'll need a tool equivalent to fsck to check for corruption of your references and counts. >> If you are going to do things differently, wouldn't it make sense to use >> one of the naturally distributed scalable databases (bigcouch, riak, >> etc.) for storage from the start since anything you do is going to >> involve re-inventing the atomic operation of updating a link or replacing >> it and the big win would be making this permit concurrent writes from >> multiple servers? > > Using something like ceph/rados for storing the chunks could be interesting > at some point but I'd like to keep things simple for now. For me, 'simple' means building something on top of the best work someone else has already done. For a single machine/filesystem you might as well just use a compressing, de-dup'ing filesystem and ignore the whole chunking/linking mess. And in fact, I wonder how it would work to just remove those operations from backuppc and leave the rest as is. But, I still think the next level in scaling would be easiest on top of a naturally distributed and replicated database. -- Les Mikesell lesmikes...@gmail.com ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ BackupPC-devel mailing list BackupPC-devel@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-devel Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/