On Thu, 26 Oct 2000, Stephen C. Tweedie wrote:
> Hi,
>
> On Thu, Oct 26, 2000 at 02:37:06PM +0200, Andreas Gruenbacher wrote:
> [...]
>
> > # process 1 process 2 comment
> > - --------- --------- -------
> > 1 get -> v process 1 reads an EA
> > and then gets interrupted.
> >
> > 2 get -> v process 2 reads an EA,
> > 3 get_set(w) -> v and succeeds in updating it.
> >
> > 4 get_set(x) - > w process 1 then tries to
> > update the EA, and realizes
> > the EA has changed.
> >
> > 5 get_set(y) -> w process 1 retries (based on
> > the new value w), and now
> > succeeds.
> >
> > Commentary
> > ----------
> > (1) Process 1 reads an EA and then gets interrupted.
> > (2+3) Process 2 reads an EA, and succeeds in updating it.
> > (4) Process 1 then tries to update the EA, and realizes the EA
> > value has changed.
> > (5) Process 1 retries (based on the new value w), and now succeeds.
> >
> > A crash of process 1 in between 4 and 5 leaves the system in an
> > inconsistent state. What's more, the inconsistent value is also exposed to
> > other processes that read the EA between steps 4 and 5.
>
> No --- between 4 and 5 we have a consistent state (process 1's update
> has been applied, process 2's has not).
>
> Think about it in terms of ACLs, if both processes are trying to add
> ACEs (call them A and B respectively) to an existing ACL XYZ.
>
> Step 1: process 1 reads XYZ.
> Step 2: process 2 reads XYZ.
> Step 3: process 2 sets XYZ+B.
> Step 4: process 1 sets XYZ+A, gets XYZ+B back, and recalculates, so
> Step 5: process 1 sets XYZ+A+B.
>
> In no case is there a bad value for the ACL. One process is adding A,
> one process is adding B, and the only possible values seen are XYZ+A,
> XYZ+B and XYZ+A+B. All of those are consistent values for overlaps of
> (add A) and (add B).
The filesystem state is consistent, that's true. But that isn't enough to
ensure consistency. Process 2 thinks it succeeded. Between steps (4) and
(5) there is no sign of process 2's success (neither on the filesystem nor
in memory), thus a failure of process 1 between (4) and (5) actually
results in process 2 beileving it succeeded, while all other processes can
only conclude it failed. That may well be a synchronization criterion.
For ACLs that may not be catastrophic. For other applications, it makes a
big difference.
> > A different operation, set_if_equal(old_value, new_value) would probably
> > work.
>
> That's an alternative mechanism, yes.
>
> > Another (perhaps simpler) approach might be to use versioning, using the
> > operations get_with_version() -> (value, version) and
> > set_if_current(new_value, old_version).
>
> No, because that requires that you have persistent version information
> for the attributes, and that the version information lasts
> indefinitely. If you think about somebody setting attributes on a
> non-openable file (eg. a /dev/* inode), it's clear that the versioning
> API needs to work even on the API which operates by name rather than
> by fd, so it must persist on closed inodes. I'd much prefer either
> the get-and-set or test-and-set API.
I'm aware of that problem. That's why I think test-and-set is less
painfull, although it involves more overhead. Think of changing a 100K EA.
Getting and test-and-set involves three copies, while versioning requires
only two.
> > That interface might be impossible to implement over existing network
> > protocols. But then, set_if_equal() might not be supported either, so
> > there wouldn't be a way to make it work over those protocols anyway.
> >
> > > Thirdly, it doesn't deal with extension --- what if I want to add a
> > > new type of attribute? Say, MAC labels or file flags (eg. ext2
> > > "chattr" flags)?
> >
> > What's wrong with system.mac, inode.immutable, etc.? (Oh yes, here we have
> > a case for per-inode EAs.) Another possibility would be to mirror multiple
> > ext2 attributes in a single EA (say, inode.flags).
>
> There's nothing wrong with them per se, but imagine the overhead if we
> required every "stat" call to reference, by textual name, every
> attribute of the file that it wanted to read!
>
> That's my real concern there --- otherwise I'm not too bothered by the
> thought of imposing textual attribute names.
I see the problem, and agree the ability to manipulate multiple EAs at
once is a useful feature.
> I still think that attribute families need a different encoding so
> that we can be really unambiguous about ACL setting and about the
> application's expectations of inheritance, atomicity etc.
I don't agree with you here. For ACLs that really require editing /
inserting / removing ACL entries, the extended attributes interface just
doesn't fit. A different interface to the kernel is needed for those cases
(different system calls, whatever).
I really don't like the idea of cluttering the EA interface to support
buffer editing operations such as overwrite, insert, replace, etc. That's
about the same as supporting ordering and insert / remove / replace
operations on distinct EAs. Simple getting and replacing of EA values
should do.
I also _really_ don't like different EA families to have different
semantics (some with atomicity some not, some with ordering some not,
etc.) That's plain ugly with no guarantees that this more powerful
interface will provide what's needed. The most extreme case (I think I
begin to see where you were heading in the first place) would be to have a
separate family for each single EA. That would be separate system calls
disguised as one.
This doesn't mean more complex operations on EAs can't be implemented in
the kernel. If you need to edit the entries of an ACL represented as one
single EA, make it a separate system call, and do the manipulations in the
implementation of that system call. Inside the kernel it's pretty easy to
provide the additional synchronization that's needed.
> > > With the fsetattr() API, you can define new attribute families very
> > > easily without losing the advantages of a properly typed API.
>
> > The single-namespace interface isn't fundamentally different. What you
> > keep in the attribute family parameter I keep in the prefix. I somewhat
> > prefer the prefix approach as it seems slightly simpler to me from the
> > point of users.
>
> OK, but we _still_ need the namespace interface for authentication
> tokens if we are to deal with things like NFSv4. That particular
> problem isn't going to go away, unfortunately.
Could you please explain that to me some more?
> > > Either way, the point stands -- building APIs on assumptions about
> > > implementation details (in this case, that ACLs are built on top of
> > > EAs) is a bad thing.
> >
> > True enough.
> >
> > The interface I proposed doesn't enforce the implementation though (I
> > guess I was unclear about this).
>
> Sure, but it makes an artificial distinction where there isn't one.
> There isn't any difference, really, between ACLs and named attributes
> --- they are just two examples from a whole continuum which includes
> attributes as mundane as filesize through MAC labels, compression
> state, DMAPI attributes and others. Singling out ACLs for special
> treatment seems bizarre --- my gut feeling is that the API needs to be
> able to deal with other forms of structured attribute in the future
> just as cleanly as it deals with ACLs, so giving ACLs its own special
> syscall seems odd.
Yes, and my position is just the opposite. I believe in keeping the EA
interface simple and using it for those things that it supports directly.
Other more complex applications are still possible in the kernel (as
described above).
Thanks,
Andreas.
------------------------------------------------------------------------
Andreas Gruenbacher, [EMAIL PROTECTED]
Contact information: http://www.bestbits.at/~ag/
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]