Re: Content based storage

David Brown Wed, 17 Mar 2010 01:23:07 -0700

On 16/03/2010 23:45, Fabio wrote:

Some years ago I was searching for that kind of functionality and found
an experimental ext3 patch to allow the so-called COW-links:
http://lwn.net/Articles/76616/

I'd read about the COW patches for ext3 before. While there iscertainly some similarity here, there are a fair number of differences.One is that those patches were aimed only at copying - there was noway to merge files later. Another is that it was (as far as I can see)just an experimental hack to try out the concept. Since it didn't takeoff, I think it is worth learning from, but not building on.

There was a discussion later on LWN http://lwn.net/Articles/77972/
an approach like COW-links would break POSIX standards.

I think a lot of the problems here were concerning inode numbers. Asfar as I understand it, when you made an ext3-cow copy, the copy and theoriginal had different inode numbers. That meant the userspace programssaw them as different files, and you could have different owners,attributes, etc., while keeping the data linked. But that broke acommon optimisation when doing large diff's - thus some people wanted tohave the same inode for each file and that /definitely/ broke posix.

With btrfs, the file copies would each have their own inode - it would,I think, be posix compliant as it is transparent to user programs. Thediff optimisation discussed in the articles you sited would not work -but if btrfs becomes the standard Linux file system, then userapplications like diff can be extended with btrfs-specific optimisationsif necessary.

I am not very technical and don't know if it's feasible in btrfs.

Nor am I very knowledgeable in this area (most of my programming is on8-bit processors), but I believe btrfs is already designed to supportlarger checksums (32-bit CRCs are not enough to say that data isidentical), and the "cp --reflink" shows how the underlying link is made.

I think most likely you'll have to run an userspace tool to find and
merge identical files based on checksums (which already sounds good to me).

This sounds right to me. In fact, it would be possible to do today,entirely from within user space - but files would need to be comparedlong-hand before merging. With larger checksums, the userspace daemonwould be much more efficient.

The only thing we can ask the developers at the moment is if something
like that would be possible without changes to the on-disk format.


I guess that's partly why I made these posts!


PS. Another great scenario is shared hosting web/file servers: ten of
thousand website with mostly the same tiny PHP Joomla files.
If you can get the benefits of: compression + "content based"/cowlinks +
FS Cache... That would really make Btrfs FLY on Hard Disk and make SSD
devices possible for storage (because of the space efficiency).


That's a good point.

People often think that hard disk space is cheap these days - but beingspace efficient means you can use an SSD instead of a hard disk. Andfor on-disk backups, it means you can use a small number of disks eventhough the users think "I've got a huge hard disk, I can make lots ofcopies of these files" !


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Content based storage

Reply via email to