Hi,

On Thu, Oct 26, 2000 at 02:37:06PM +0200, Andreas Gruenbacher wrote:

> > So how do we deal with an existing filesystem which allows user
> > attributes to be prefixed with "." or "$"?  Or which allows arbitrary
> > other prefixes?  Or which allows "." in the middle of attribute names?
> 
> If all EAs always have prefixes, that's no real problem, though it gets a
> bit ugly.

Exactly --- the use of attribute families removes that particular part
of the ugliness.  I've certainly got no objection to trying to keep
specific attribute names human-readable, but as Hans has pointed out
there may be places where the application has specific requests for
system attributes, or for atomic update (if the attribute is to be
used to encode an ACL), and that sort of information is less naturally
represented as an ascii name.

> > Also, your API doesn't let us set multiple EAs at once, or to do an
> > atomic set-and-get.
> 
> True. Multiple-updates was something I explicitly tried not to get into
> the API. It makes things much more complicated (and prone to bugs). I
> really don't see the need for that. If you think this is an essential
> feature, a good example might convince me.

Multiple update was really a side effect of two things: the need to
present ACE lists needs multiple updates if you consider each ACE to
be a separate attribute (and some --- but not all --- ACL
implementations have this property); and you really would like to be
able to query multiple attributes at once, both so that you can list
the available attributes (necessary for attribute-aware "cp", "tar"
etc.) and so that "ls -l" doesn't have to do a dozen syscalls to get
all of the necessary information out of the inode.

> #  process 1            process 2             comment
> -  ---------            ---------             -------
> 1  get -> v                                   process 1 reads an EA
>                                               and then gets interrupted.
> 
> 2                       get -> v              process 2 reads an EA,
> 3                       get_set(w) -> v       and succeeds in updating it.
> 
> 4  get_set(x) - > w                           process 1 then tries to
>                                               update the EA, and realizes
>                                               the EA has changed.
> 
> 5  get_set(y) -> w                            process 1 retries (based on
>                                               the new value w), and now
>                                               succeeds.
> 
> Commentary
> ----------
> (1)   Process 1 reads an EA and then gets interrupted.
> (2+3) Process 2 reads an EA, and succeeds in updating it.
> (4)   Process 1 then tries to update the EA, and realizes the EA
>       value has changed.
> (5)   Process 1 retries (based on the new value w), and now succeeds.
> 
> A crash of process 1 in between 4 and 5 leaves the system in an
> inconsistent state. What's more, the inconsistent value is also exposed to
> other processes that read the EA between steps 4 and 5.

No --- between 4 and 5 we have a consistent state (process 1's update
has been applied, process 2's has not).

Think about it in terms of ACLs, if both processes are trying to add
ACEs (call them A and B respectively) to an existing ACL XYZ.

Step 1: process 1 reads XYZ.
Step 2: process 2 reads XYZ.
Step 3: process 2 sets XYZ+B.
Step 4: process 1 sets XYZ+A, gets XYZ+B back, and recalculates, so
Step 5: process 1 sets XYZ+A+B.

In no case is there a bad value for the ACL.  One process is adding A,
one process is adding B, and the only possible values seen are XYZ+A,
XYZ+B and XYZ+A+B.  All of those are consistent values for overlaps of
(add A) and (add B).

> A different operation, set_if_equal(old_value, new_value) would probably
> work.

That's an alternative mechanism, yes.

> Another (perhaps simpler) approach might be to use versioning, using the
> operations get_with_version() -> (value, version) and
> set_if_current(new_value, old_version).

No, because that requires that you have persistent version information
for the attributes, and that the version information lasts
indefinitely.  If you think about somebody setting attributes on a
non-openable file (eg. a /dev/* inode), it's clear that the versioning
API needs to work even on the API which operates by name rather than
by fd, so it must persist on closed inodes.  I'd much prefer either
the get-and-set or test-and-set API.

> That interface might be impossible to implement over existing network
> protocols. But then, set_if_equal()  might not be supported either, so
> there wouldn't be a way to make it work over those protocols anyway.
> 
> > Thirdly, it doesn't deal with extension --- what if I want to add a
> > new type of attribute? Say, MAC labels or file flags (eg. ext2
> > "chattr" flags)?
> 
> What's wrong with system.mac, inode.immutable, etc.? (Oh yes, here we have
> a case for per-inode EAs.) Another possibility would be to mirror multiple
> ext2 attributes in a single EA (say, inode.flags).

There's nothing wrong with them per se, but imagine the overhead if we
required every "stat" call to reference, by textual name, every
attribute of the file that it wanted to read!

That's my real concern there --- otherwise I'm not too bothered by the
thought of imposing textual attribute names.  I still think that
attribute families need a different encoding so that we can be really
unambiguous about ACL setting and about the application's expectations
of inheritance, atomicity etc.

> > With the fsetattr() API, you can define new attribute families very
> > easily without losing the advantages of a properly typed API.
 
> The single-namespace interface isn't fundamentally different. What you
> keep in the attribute family parameter I keep in the prefix. I somewhat
> prefer the prefix approach as it seems slightly simpler to me from the
> point of users.

OK, but we _still_ need the namespace interface for authentication
tokens if we are to deal with things like NFSv4.  That particular
problem isn't going to go away, unfortunately.

> > Either way, the point stands -- building APIs on assumptions about
> > implementation details (in this case, that ACLs are built on top of
> > EAs) is a bad thing.
> 
> True enough.
> 
> The interface I proposed doesn't enforce the implementation though (I
> guess I was unclear about this).

Sure, but it makes an artificial distinction where there isn't one.
There isn't any difference, really, between ACLs and named attributes
--- they are just two examples from a whole continuum which includes
attributes as mundane as filesize through MAC labels, compression
state, DMAPI attributes and others.  Singling out ACLs for special
treatment seems bizarre --- my gut feeling is that the API needs to be
able to deal with other forms of structured attribute in the future
just as cleanly as it deals with ACLs, so giving ACLs its own special
syscall seems odd.

> One other thing my interface explicitly does not support is updating only
> part of an EA. That _could_ be implemented with the versioning interface
> above. I would still consider allowing that a very bad idea.

Right.  If a filesystem _does_ happen to implement a stream-based
attribute mechanism, then providing access to parts of an EA through a
separate stream-based API would be easy for such a filesystem to
offer.  Mandating access through a stream-based API would make the API
useless to filesystems with a simpler implementation.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]

Reply via email to