Re: Hierarchical tag structure in SetFS

Tuomo Valkonen Thu, 01 Mar 2007 11:33:41 -0800

To avoid further confusions, a clarifying example is perhaps
in order:

Suppose we're dealing with two files, e.g. the README files for 
Ion, and, say, gcc. On a traditional hierarchical FS, we might 
store these as


        ion/README
        gcc/README

and these paths would be the unique identifiers of these files. 
But we could just as well have chosen

        README/ion
        README/gcc

You can of course have both by symlinking, but it gets cumbersome.
The solution: setfs, removing the order from identies. A file
is not identified by an (ordered) sequence of strings/tags, but an
(unordered) set of them.

Now, both of these files also have additional data associated with
them: the author of Ion and ion/README is me, Tuomo Valkonen, the 
authorfc gcc/README is someone else. Both files have some creation
dates, and so on. Some of this data could be considered to be "tags"
(say, tuomov.author), and some of it not (the creation time). But
this information is not really essential to the "identity" of the
file. It's merely meta-data. That is not something that a basic 
version of setfs would deal with, and it something not to be had
in the identity of the file. 

Now, setfs could of course support meta-data, but it has to be done
in a peculiar manner, so that the unique identifier can be easily
picked out from the path. I have thought to reserve path elements
beginning with a '#' symbol for that (so that identifying tags can
not begin with the symbol). For example: the path
'#author:tuomov/ion/README' clearly indicates, that 'ion' and
'README' are the _persistent_ identifying tags, and '#author:tuomov'
is merely a meta-data filter used for searching the file, and __can
be ignored by non-search functions__. Likewise, you could support
'or'ing meta-data for searches: for example, the result from the
search '#author:tuomov|project:gcc/README' could list the tags 'ion'
and 'gcc' -- and still we could get the persistent identifier for
both results by dropping the '#author:tuomov|project:gcc' term.
(Meta-data filters with an 'and' in them can simply '/#' instead
of '|', since the terms in the path are 'anded'.)

Hard links are also possible: you just give two different tag sets
to a file. However, you can't just 'or' the search for such 
identifier tag sets directly: you must still use the special 
syntax to be able to pick out the persistent parts of the result.
So suppose the same file had two identifying sets of tag 'foo/bar'
and 'baz/quk'. Then you might make the search '#tag:foo/#tag:bar',
and this might return both 'foo/bar' and 'baz/quk'. Yes, either
'foo' or 'bar' is in a sense "twice" for both results, but the
other instance is encoded in a manner that indicates a search 
filter that is not part of the identity of the result.

So, you see, basic setfs is about identity, not meta-data.
It is about providing persistent human-readable identifiers
to files (unlike a basic meta-data -based database file system),
without a constraining hierarchy (unlike traditional file 
systems). Meta-data/filters/informational tags is something 
else that can be had on top of that, but is not the fundamental
principle behind setfs: partial order instead of total order is.

-- 
Tuomo

Re: Hierarchical tag structure in SetFS

Reply via email to