The problem with fixing shortcuts in file-systems, and with links and with 
names and with files in filesystems in general, is that the intention of the 
link, and even of a filename is not clear.

For a simple example, compare a jpeg of your cat to a configuration file, e.g. 
/etc/passwd. 

It is silly to keep multiple copies of the jpeg of your cat in multiple 
locations on your hard drive just because you use it as your desktop wallpaper, 
have it in a slideshow, have referenced it in a newsletter, etc. but if you 
move or rename the file you would prefer that all these references to it don't 
suddenly break. In this case you might say that your intention is to refer to 
-this- file, regardless of name or location. Even an updated picture is not the 
same picture. Links to this file are perhaps intended to be links to this 
specific image.

On the other hand, your /etc/passwd file tracks the users of your computer. 
When you update the file to change your password you don't want the system to 
keep referring to the old file. When you copy the file elsewhere for backups 
you don't want the system to go looking in the new location instead of the old 
location. In this case the intention is to refer to -any- file that is found in 
-this- location. Links to this file are intended to be links to this location.

An interesting approach is to separate the naming concern from the content 
concern. One project is doing exactly this: http://camlistore.org  
(Content-Addressable Multi-Layer Indexed Storage.)  The contents of files are 
stored as blobs and are addressed by the hash of their contents. Those hashes 
are referred to by claims, which are similar to the filenames and directories 
and links of a conventional file system. A natural consequence of this system 
is that de-duplication is built in and replication is simple. It has not been 
designed to be suitable for a high-performance file system though.

Chuck


On Oct 7, 2014, at 7:42 AM, Fernando Cacciola <fernando.cacci...@gmail.com> 
wrote:

> On Tue, Oct 7, 2014 at 11:29 AM, Fernando Cacciola 
> <fernando.cacci...@gmail.com> wrote:
> 
> I do realise of course that maintaining a gigantic master index of all files 
> is not applicable to the real world, but nonetheless I think the general form 
> of the solution (split identity from location) is worth being considered.
> 
> 
> OTOH, it just ocurred to me that this can be made to scale to the real world 
> in the following way:
> 
> A file could be formally classified a standalone or multiply-referenced.
> A standalone file would be one which doesn't have an entry in the master 
> index.
> A multiply-referenced file would be listed in the master index, given a 
> unique id and mapped to its current location.
> 
> When you create a new link, the file is given a unique id, added to the 
> master index and the link is associated to the target's id.
> The master index can even have a reference count allowing the file to 
> transition to standalone state when the last link is removed.
> 
> This way the master index size is kept bounded by the number of effective 
> links in the file system.
> 
> 
> Best
> 
> -- 
> Fernando Cacciola
> SciSoft Consulting, Founder
> http://www.scisoft-consulting.com
> _______________________________________________
> fonc mailing list
> fonc@vpri.org
> http://vpri.org/mailman/listinfo/fonc

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Reply via email to