Hi,
On Sun, Oct 29, 2000 at 03:15:57PM +0100, Andreas Gruenbacher wrote:
>
> The interface described here also doesn't include Stephen's idea to allow
> an ordered list of EA's under the same name. In addition to the append and
> prepend operations Stephen suggested, a whole range of other operations
> (get/delete/... by index, etc.) might make sense, and stuff like that
> could well be added. However, it would complicate the semantics even
> further. I'd really like to learn more about the requirements for that.
>
> Stephen, do you have any good pointers?
The main motivation for having that was simply to deal with ACLs as
sets of named ACEs, in just the same way that a file can contain a set
of named EAs. Being able to define the namespace separately for each
ACE in the ACL was an important property of the API I proposed, since
it would allow you to mix both local uids and remote credentials in an
ACL.
> We have also been discussing how to support different EA namespaces.
> Stephen's approach was to use an integer namespace id to specify the
> namespace, while my approach was to use a textual prefix to the EA name.
> While those approaches are semantically equivalent, I have been convinced
> that an integer specifier is easier to handle in the kernel.
>
> Still, I believe in textual names at the user interface. I think the id's
> should be translated from/into textual names in a userspace library before
> presenting them to users.
No problem, we already do that for things like port numbers.
> One of the issues raised was that it's important to be able to manipulate
> multiple EA's at once. The reason for this was to reduce system call
> overhead.
>
> Another idea was to allow manipulation of multiple EA's in an atomic way.
> If I recall correctly an even stronger semantic requirement of
> manipulating multiple EA's in a transactional way was also suggested.
On-disk transactional semantics _must_ remain an implementation
option.
> all of the above in a clean, simple and extensible way. NFSv4 supports
> compound operations, in which multiple requests are packed into a single
> RPC. A similar approach might also make sense for the EA interface.
Umm, maybe, as long as we avoid over-engineering it...
> Note that the interface proposed here is comparable to Tru64's property
> lists interface (although it goes beyond that). The Tru64 proplist(4)
> manual page is here: <http://www.tru64unix.compaq.com/
> faqs/publications/base_doc/DOCUMENTATION/V50_HTML/MAN/MAN4/0200____.HTM>
Good --- the DU "PropertyList" uses exactly the sort of abstract
naming I think we need.
> I could imagine the system call(s) to be implemented like this:
>
> int sys_ext_attr_file(char *path, int namespace, int flags,
> struct ea_request *request, size_t request_len,
> int *results, size_t result_size);
>
> int sys_ext_attr_fd(int fd, int namespace, int flags,
> struct ea_request *request, size_t request_len,
> int *results, size_t result_size);
>
> (This doesn't actually work as system calls as is because there are too
> many parameters.)
That's easy to hide in libc's stubs --- we already have to do that for
some system calls like mmap64.
> Multiple EA operations are marshalled into the reuest buffer; after the
> system call the results buffer contains the results. Operations are
> encoded in the request buffer as variable-size records with this
> structure:
>
> struct ea_request {
> int operation;
> /* additional operation specific fields */
> };
>
> Results just consist of one integer status code per operation.
>
> Operation could be one of:
>
> EA_REQ_LIST
> List the names of all EA's defined for this inode.
> EA_REQ_GET
> Get the value of an EA.
> EA_REQ_GETSIZE
> Get the buffer size required for storing the value of an EA.
> EA_REQ_SET
> Set the value of an EA to a new value.
> EA_REQ_REMOVE
> Remove an EA.
OK.
> The EA_REQ_LIST operation can pass attribute names as variable length
> records. With an integer namespace identifier the previous
> "name1\0name2\0name3\0\0" format isn't suitable anymore, so this format
> can be used instead:
>
> struct ea_entry {
> int namespace;
> unsigned short name_len;
> char name[]; /* size padded to machine word size */
> };
That's "namespace" used twice. Can you be specific about this? It
looks as if the "namespace" in your syscall corresponds more or less
exactly with my concept of an "attribute family", and the "namespace"
in an ea_entry with my "name family". You might want to rename these
fields to be a little less ambiguous.
> The default semantics would be to process the requests in sequence,
> aborting at the first request that fails. The system call itself could
> return the number of requests processed successfully.
>
> EA_FLAG_ISOLATED
> EA_FLAG_ATOMIC
> EA_FLAG_SYNC
For normal named attributes, many implementations simply will not be
able to guarantee these ISOLATED or ATOMIC properties.
For ACLs, non-isolated or non-atomic requests are completely illegal.
Does it really make sense for the application to be able to specify
these flags in so much detail in that case? I guess I agree it is
better to specify these as options rather than to have to use a
different attribute family to change the requested flags.
EA_FLAG_SYNC is a good point, though.
> The op_flags member of individual operations could include:
> EA_OP_FLAG_CREATE
> The operation only succeeds if the EA doesn't exist already.
> EA_OP_FLAG_EXISTS
> The operation only succeeds if the EA exists already.
I think things like NFSv4 don't let you do this reliably.
The advantage of having a single "ATR_USER" attribute family is that
you can't specify such options, so you get an API which really is a
lowest-common-denominator which can be used by an APP without too much
worry about the type of fs underneath. I just worry that making the
API too flexible here is going to mean that applications start to
develop bad habits and rely on things which will break over NFSv4.
At least, we need the implementation limitations of all of the common
filesystems to be documented clearly in the attribute API man pages.
> attribute, it must only be guaranteed that the cookie changes when that EA
> changes. Operation sequences: [EA_REQ_GET_COOKIE, EA_REQ_GET] (no flags
> required), and at some later point in time: [EA_REQ_VERIFY_COOKIE,
> EA_REQ_SET] (EA_FLAG_ISOLATED).
>
> I don't know if any protocols support the value comparison approach, but
> don't support the cookie apporach. AFAIK NFSv4 supports neither, but a
> verify operation can be followed by a set operation in a single RPC
> request, so at least the time window for inconsistencies gets minimized.
Right.
For the record, just how do you see ACLs being mapped onto this API?
That was a point of contention in the past, but if you now see ACLs as
being a different syscall namespace, I think we've pretty much closed
the gaps between the proposed APIs.
Cheers,
Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]