On Mon, 30 Oct 2000, Curtis Anderson wrote:
> No, I believe that we should have a common API. Each filesystem should
> be able to implement the storage/retrieval of EAs however it wants to,
> but portability of applications demands that the semantics of the API be
> fixed across filesystems.
So I'm going to stick my head in here again, in the name of application
portability: there are several places in which common APIs may be useful,
based on the abstractions present in most UNIX-like operating systems.
Here are the ones that pop to my mind, but this is based on the FreeBSD
architecture using a UFS-based file system, so there may be different ones
in Linux:
+--------------------+
| Library Interface |
+--------------------+
|
+--------------------+
| Kernel ABI |
+--------------------+
|
+--------------------+
| Kernel Service |
+--------------------+
|
+--------------------+
| VFS Interface |
+--------------------+
|
+--------------------+ +-----------------+
| FS Interface (FFS) | | FS Family (UFS) |
+--------------------+ +-----------------+
In FreeBSD, system calls are often exposed directly to the application via
dynamically generated ASM wrappers built into libc based on the system
call definition file in the kernel. Other times, they are wrapped by thin
library layers that perform minor semantic changes, or larger layers based
on interfaces that differ substantially. This is the case for the FreeBSD
ABI, but for other ABIs, we rely on the native library implementation for
that ABI to do most of the work.
Ideally, the EA interface across various platforms (*BSD, Linux,
IRIX/TRIX, others) visible to the application would be essentially
identical, allowing applications that might benefit from direct EA
functionality to be portable across all of them in terms of source code
portability (you recompile KDE on FreeBSD, and its file manager gets icons
from the appropriate EA in the same way it would in Linux).
Ideally, the ACL interfaces across various platforms (*) visible to the
application would also be identical -- this is much more likely given
POSIX.1e (+ minor modifications) as a starting point. Similarly, the
example holds: the same code to manage ACLs in Samba and KDE could be used
on all platforms.
I see this as more likely with the ACLs because we have an existing
standard (syntax + semantics) to work with. With EAs, we have a common
starting point (the EA interface hashed out between Andreas and myself
over many months, based on input from Casey at SGI, and inspection of
interfaces on other platforms), so I think we can probably come to at
least concensus on a minimal common API for this, even if there are added
features not supported by all platforms (the transaction semantic, while
it sounds very useful, is something you won't find on many platforms due
to atomicity requirements for compound operations). I might argue against
less portable features in the API on the basis that applications that want
to be portable must inevitably fall down to the lowest common
denomintator: for example, if one platform supports transactional
combining of EAs (not test/set, which is easy to provide on all platforms,
rather the more complicated transaction semantic), but others don't, then
applications will need to be programmed to not rely on the transactional
guarantees so as to be safe on other platforms. I won't say this is
absolutely the case, but I think it's a reasonable argument for being
careful what you put in interfaces given our currently largely portable
application set.
So at the application level, it's clearly desirable to have compatible
(ideally, identical) interfaces. I would argue it's also helpful to have
semantically similar ABIs, if not identical in the syntactic sense. The
reason for this is that many platforms rely on ABI "emulation" to provide
access to non-native applications. For example, both Linux and FreeBSD
attempt to provide emulation of other operation systems that run on the
same hardware platforms (System V apps, etc). Using a similar ABI means
that the ABI wrapping code doesn't have to become substantially bloated to
perform syntactic conversions, and preferably doesn't have to do semantic
conversions at all. This is an area where failure to come to concensus is
acceptable: people are used to having to perform miracles in ABI wrappers,
it's just not all that desirable. :-)
Below this in FreeBSD is the kernel services layer: these are kernel
abstractions that may be used by code in various ABI layers, both native
FreeBSD code (which sometimes directs system calls directly to the service
layer, and other times to wrappers), and for foreign ABIs (the SysV and
Linux emulators generally back functionality onto the services layer, or
onto an identical FreeBSD system call). At this level, syntax is probably
not an issue at all in a precise sense, but maintaining common semantics
is helpfuly for all the reasons present at other layers.
Below this is our VFS layer: like it or not, file system support is
something that is really nice to do in a portable way. There are common
file systems across many platforms: Coda and Arla have demonstrated that
this can be done in a relatively scalable manner, but you could imagine
other file systems being available across platforms using the same code
base, including XFS from SGI, etc. Having similar VFS layers makes this
possible, althought I think it would be foolish to expect immediate
portability given differences in vnode (inode) handling across platforms
(the inode number uniqueness guaranty, for example, caused headaches for
cross-platform distributed file system implementation on Linux, whereas
the broken VFS locking in FreeBSD has caused similar pains for other
projects, such as the FiST work, I believe). Similar semantics at the VFS
layer, however, means that file systems on multiple platforms are more
likely to offer features consistently: FreeBSD has an ext2fs
implementation (based largely on the Linux implementation, I believe). If
EA interfaces and semantics are the same across both platforms in the
native file systems, it makes it far easier to support the native file
system of the other platform.
And at the FS layer, the argument holds also. I feel like I'm repeating
myself: probably because this seems to be a consistent argument throughout
the design of the system. The same arguments that recommend portable API
design across operating systems also apply across file systems in the same
OS: offering different APIs and subtantially different semantics for EAs
at the FS level make it difficult for the VFS/OS to synthesize a common
API, and if that fails, means that application suites have to do the
synthesis instead (leading to more errors, poor understanding of the
semantics, and a lowest common denominator use instead of design).
Portable API design doesn't necessarily mean lowest common denominator
design, but does main starting with the same basic assumptions.
Right now, the asssumptions in Andreas' and my EA implementations, and the
older Plan G EAs on TRIX (I can't claim familiarity with EA semantics on
XFS, but would love a pointer to documentation so thatI could) provide the
following:
o One or more namespaces with protection properties based on the namespace
o Get, set/replace operations that are atomic for the particular
file+attribute
o List operations that are atomic for the file
We've seen strong arguments that we need an atomic modify operation to
prevent races on ACLs (btw, these races exist already in permission
setting using stat()/chmod(), as I'm sure you're aware, and are only
inherited by the POSIX.1e ACL interface). This argues for at least one of
a test/set or testcookie/set operation. In reality, I'd prefer just one
because it means applications will use the same interface and there are
less likely to be portability problems. Of the two, which is NFSv4 most
likely to support? Either supports ACL modification fine.
We've also seen reasonable arguments for bulk EA operations, for two
reasons: (1) Reduce overhead involved in large numbers of EA operations on
a single file (such as in backup/restore activities), and (2) provide
stronger atomicity guarantees (such as ACID properties) over a set of EA
manipulations. It has been argueed that (2) would provide incentive to
application designers to use different EAs for different things, rather
than concentrating a variety of EA-worthy material into a single EA.
I'm not sure how I feel about bulk EA calls: they introduce additional
complexity at various levels in the file system stack, depending on th
strength of the semantics. If the semantics don't require to-disk
atomicity for bulk EA meta-data updates, then the combining can occur at
the service level in the stack I described: applications perceive a single
call, get the advantage of few transitions, allow the kernel to optimize
certain types of overlapping or redundant calls, and get inter-process
atomicity for the file. If the semantics do require to-disk atomicity,
this complexity is pushed further down the stack, complicating the VFS
interface, and requiring file systems to provide services that are not in
the file system model, or are properties of limitations in the EA service
of the file system.
For example, in Andreas's current ext2fs single-block EA implementation,
it is possible to provide transaction-like atomicity on EAs for a single
file: all changes are written out in a single block write, subject of
course to disk unpleasentness. This is similar to the ability to get
atomic meta-data updates on inode write-outs, since they're in the same
disk block. However, as soon as you want to have EAs that are larger in
sum than a single disk block (not hard to imagine), then you lose the
ability to offer transactions over bulk EA updates in ext2fs. My current
FreeBSD implementation does offer atomicity for a single EA on a file, but
not over multiple EAs, as they're stored in different disk blocks. The
same goes for the older Plan G implementation on IRIX, and on HPFS.
The same *also* holds if you allow existing inode metadata retrieved using
stat() and set using chmod/chown/etc to be updated using EAs. In this
case, unless those attributes are migrated to the EA block on ext2fs,
there are no guarantees if the transaction covers those features. The
reason this is raised, btw, is that the current vop_getattr/vop_setattr on
BSD have very poor semantics, as they essentially offer atomic retrieval
and setting of all inode attributes. However, failure mode given a
failure in one element is poorly defined, especially in light of features
such as immutable flags. If you continue to store some attributes close
to the inode on inode-based file systems, but allow larger attributes to
be stored elsewhere, this transaction guaranty is hard to provide. With
journallying, this becomes easier to handle, and although journalling may
be a safe assumption in the future, there are plenty of UNIX-like
operating systems where journalling is not available (or doesn't journal
EAs) which portability is desired with. You also have the issue of legacy
systems which can easily support EAs, just not with transactional
semantics over multiple EAs.
I admit I come at this with the bias of someone working on an OS with
structural meta-data consistency guarantees (soft updates) but without
guarantees on all meta-data (certain inode elements, and EAs). :-)
Robert N M Watson FreeBSD Core Team, TrustedBSD Project
[EMAIL PROTECTED] NAI Labs, Safeport Network Services
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]