Hi,

a few observations of my own, no to be taken seriously:

1) Block level deduplication

There are already a lot of filesystem/filesystem layers in fuse (such as
ZFS, lessfs, ...) which do this. This is often more efficient then
rolling an own solution and is well abstracted.
In my opinion it does not make sense to do block level deduplication in
the application layer, except if you do it on the client side to safe
bandwidth.

2) Database

I would suggest not abusing the file system as database and using
something like SQLite. This gives you features like transactions, atomic
operations, etc. and also improves speed.

3) v4

Is v4 published somewhere? What you are doing seems to be more like a
fork, if there are huge changes in v4 and you are working on v3.

Regards

Am 07.08.2012 10:36, schrieb Wessel Dankers:
> Hi Les,
>
> On 2012-08-06 13:05:53-0500, Les Mikesell wrote:
>> On Mon, Aug 6, 2012 at 9:46 AM, Wessel Dankers
>> <[email protected]> wrote:
>>> The ideas overlap to a limited extent with the ideas[0] that Craig posted
>>> to this list. For instance, no more hardlinks, and garbage collection is
>>> done using flat-file databases. Some things are quite different. I'll try
>>> to explain my ideas here.
>> Personally I think the hardlink scheme works pretty well up to about
>> the scale that I'd want on a single machine and you get a badly needed
>> atomic operation with links more or less for free.
> Adding a chunk can be done atomically using a simple rename(). Removing a
> chunk can be done atomically using unlink(). The only danger lies in
> removing a chunk that is still being used (because there's a backup still
> in progress whose chunks aren't being counted yet by the gc procedure). The
> simplest way to prevent that is to grant exclusive access to the gc
> process. Note that hardlinks do not prevent this race either. It's a
> problem we need to solve anyway.
>
> Craig lists a couple of reasons to abandon the hardlink scheme in
> http://sourceforge.net/mailarchive/message.php?msg_id=27140176
>
>> If you are going to do things differently, wouldn't it make sense to use
>> one of the naturally distributed scalable databases (bigcouch, riak,
>> etc.) for storage from the start since anything you do is going to
>> involve re-inventing the atomic operation of updating a link or replacing
>> it and the big win would be making this permit concurrent writes from
>> multiple servers?
> Using something like ceph/rados for storing the chunks could be interesting
> at some point but I'd like to keep things simple for now.
>
> cheers,
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. Discussions 
> will include endpoint security, mobile security and the latest in malware 
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
> _______________________________________________
> BackupPC-devel mailing list
> [email protected]
> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-devel
> Wiki:    http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
BackupPC-devel mailing list
[email protected]
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-devel
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to