Re: [PATCH 00/37] Permit filesystem local caching
Daniel Phillips [EMAIL PROTECTED] wrote: I need to respond to this in pieces... first the bit that is bugging me: * two new page flags I need to keep track of two bits of per-cached-page information: (1) This page is known by the cache, and that the cache must be informed if the page is going to go away. I still do not understand the life cycle of this bit. What does the cache do when it learns the page has gone away? That's up to the cache. CacheFS, for example, unpins some resources when all the pages managed by a pointer block are taken away from it. The cache may also reserve a block on disk to back this page, and that reservation may then be discarded by the netfs uncaching the page. The cache may also speculatively take copies of the page if the machine is idle. Documentation/filesystems/caching/netfs-api.txt describes the caching API as a process, including the presentation of netfs pages to the cache and their uncaching. How is it informed? [Documentation/filesystems/caching/netfs-api.txt] == PAGE UNCACHING == To uncache a page, this function should be called: void fscache_uncache_page(struct fscache_cookie *cookie, struct page *page); This function permits the cache to release any in-memory representation it might be holding for this netfs page. This function must be called once for each page on which the read or write page functions above have been called to make sure the cache's in-memory tracking information gets torn down. Note that pages can't be explicitly deleted from the data file. The whole data file must be retired (see the relinquish cookie function below). Furthermore, note that this does not cancel the asynchronous read or write operation started by the read/alloc and write functions. [/] Who owns the page cache in which such a page lives, the nfs client? Filesystem that hosts the page? A third page cache owned by the cache itself? (See my basic confusion about how many page cache levels you have, below.) [Documentation/filesystems/caching/fscache.txt] (7) Data I/O is done direct to and from the netfs's pages. The netfs indicates that page A is at index B of the data-file represented by cookie C, and that it should be read or written. The cache backend may or may not start I/O on that page, but if it does, a netfs callback will be invoked to indicate completion. The I/O may be either synchronous or asynchronous. [/] I should perhaps make the documentation more explicit: the pages passed to the routines defined in include/linux/fscache.h are netfs pages, normally belonging the pagecache of the appropriate netfs inode. This is, however, mentioned in the function banner comments in fscache.h. Suppose one were to take a mundane approach to the persistent cache problem instead of layering filesystems. What you would do then is change NFS's -write_page and variants to fiddle the persistent cache It is a requirement laid down by the Linux NFS fs maintainers that the writes to the cache be asynchronous, even if the writes to NFS aren't. Note further that NFS's write_page() != writing to the cache. Writing to the cache is typically done by NFS's readpages(). Besides, at the moment, caching is suppressed for any NFS file opened for writing due to coherency issues. This is something to be revisited later. as well as the network, instead of just the network as now. Not as now. See above. This fiddling could even consist of -write calls to another filesystem, though working directly with the bio interface would yield the fastest, and therefore to my mind, best result. You can't necessarily access the BIO interface, and even if you can, the cache is still a filesystem. Essentially, what cachefiles does is to do what you say: to perform -write calls on another filesystem. FS-Cache also protects the netfs against (a) there being no cache, (b) the cache suffering a fatal I/O error and (c) the cache being removed; and protects the cache against (d) the netfs uncaching pages that the cache is using and (e) conflicting operations from the netfs, some of which may be queued for asynchronous processing. FS-Cache also groups asynchronous netfs store requests together, which hopefully, one day, I'll be able to pass on to the backing fs. In any case, you find out how to write the page to backing store by asking the filesystem, which in the naive approach would be nfs augmented with caching library calls. NFS and AFS and CIFS and ISOFS, but yes, that's what fscache is, if you like, a caching library. The filesystem keeps its own metadata around to know how to map the page to disk. So again naively, this metadata could tell the nfs client that the page is not mapped to disk at all. The netfs should _not_ know about the metadata of a backing fs. Firstly, there are many different potential backing filesystems, and secondly if
Re: [PATCH 00/37] Permit filesystem local caching
Daniel Phillips [EMAIL PROTECTED] wrote: This factor of four (even worse on XFS, not quite as bad on Ext3) is worth ruminating upon. Is all of the difference explained by avoiding seeks on the server, which has the files in memory? Here are some more stats for you to consider: (1) Copy the data across the network to a fresh Ext3 fs on the same partition I was using for the cache: [EMAIL PROTECTED] ~]# time cp -a /warthog/aaa /var/fscache real0m39.052s user0m0.368s sys 0m15.229s (2) Reboot and read back the files just written into Ext3 on the local disk: [EMAIL PROTECTED] ~]# time tar cf - /var/fscache/aaa /dev/zero real0m40.574s user0m0.164s sys 0m3.512s (3) Run through the cache population process, and then run a tar directly on cachefiles's cache directly after a reboot: [EMAIL PROTECTED] ~]# time tar cf - /var/fscache/cache /dev/zero real4m53.104s user0m0.192s sys 0m4.240s So I guess there's a problem in cachefiles's efficiency - possibly due to the fact that it tries to be fully asynchronous. In case (1) this is very similar to the time for a read through a completely cold cache (37.497s). In case (2) this is comparable to cachefiles with a cache warmed prior to a reboot (1m54.350s); in this case, however, cachefiles is doing some extra work: (a) It's doing a lookup on the server for each file, in addition to the lookups on the disk. However, just doing a tar from plain NFS, the command completes in 22.330s. (b) It's reading an xattr per object for cache coherency management. (c) As the cache knows nothing of directories, files, etc., it lays its directory subtree out in a way that suits it. File lookup keys are turned into filenames. This may result in a less efficient arrangement in the cache than the original data, especially as directories may become very large, so Ext3 may be doing some extra work. In case (3), this perhaps suggests that cachefiles's directory layout may be part of the problem. Running the following: ls -ldSr `find . -type d` in /var/fscache/cache shows that the directories are either 4096 bytes in size (158 instances) or 12288 bytes in size (105 instances), for a total of 263 directories. There are 19255 files. Running that ls command in /warthog/aaa shows 1185 directories, all but three of them 4096 bytes in size; two are 12288 bytes and one is 20480 bytes in size (include/linux/ unsurprisingly). There are 19258 files, three of which are hardlinks to other files in the tree. This could be easily tested by running a test against a server that is the same as the client, and does not have the files in memory. If local access is still slower than network then there is a real issue with cache efficiency. My server is also my desktop machine. The only way to guarantee that the memory is scrubbed is to reboot it:-( I'll look at setting up one of my other machines as an NFS server. David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/37] Permit filesystem local caching
Daniel Phillips [EMAIL PROTECTED] wrote: On Monday 25 February 2008 15:19, David Howells wrote: So I guess there's a problem in cachefiles's efficiency - possibly due to the fact that it tries to be fully asynchronous. OK, not just my imagination, and it makes me feel better about the patch set because efficiency bugs are fixable while fundamental limitations are not. One can hope:-) How much of a hurry are you in to merge this feature? You have bits like this: I'd like to get it upstream sooner rather than later. As it's not upstream, but it's prerequisite patches touch a lot of code, I have to spend time regularly making my patches work again. Merge windows are completely not fun. Add a function to install a monitor on the page lock waitqueue for a particular page, thus allowing the page being unlocked to be detected. This is used by CacheFiles to detect read completion on a page in the backing filesystem so that it can then copy the data to the waiting netfs page. We already have that hook, it is called bio_endio. Except that isn't accessible. CacheFiles currently has no access to the notification from the blockdev to the backing fs, if indeed there is one. All we can do it trap the backing fs page becoming available. My strong intuition is that your whole mechanism should sit directly on the block device, no matter how attractive it seems to be able to piggyback on the namespace and layout management code of existing filesystems. There's a place for both. Consider a laptop with a small disk, possibly subdivided between Linux and Windows. Linux then subdivides its bit further to get a swap space. What you then propose is to break off yet another chunk to provide the cache. You can't then use this other chunk for anything else, even if it's, say, 1% used by the cache. The way CacheFiles works is that you tell it that it can use up to a certain percentage of the otherwise free disk space on an otherwise existing filesystem. In the laptop case, you may just have a single big partition. The cache will fill up as much of it can, and as the other contents of the partition consume space, the cache will be culled to make room. On the other hand, a system like my desktop, where I can slap in extra disks with mound of extra disk space, it might very well make sense to commit block devices to caching, as this can be used to gain performance. I have another cache backend (CacheFS) which takes the form of a filesystem, thus allowing you to mount a blockdev as a cache. It's much faster than Ext3 at storing and retrieving files... at first. The problem is that I've mucked up the free space retrieval such that performance degrades by 20x over time for files of any size. Basically any cache on a raw blockdev _is_ a filesystem, just one in which you're randomly allowed to discard data to make life easier. I see your current effort as the moral equivalent of FUSE: you are able to demonstrate certain desirable behavioral properties, but you are unable to reach full theoretical efficiency because there are layers and layers of interface gunk interposed between the netfs user and the cache device. The interface gunk is meant to be as thin as possible, but there are constraints (see the documentation in the main FS-Cache patch for more details): (1) It's a requirement that it only be tied to, say, AFS. We might have several netfs's that want caching: AFS, CIFS, ISOFS (okay, that last isn't really a netfs, but it might still want caching). (2) I want to be able to change the backing cache. Under some circumstances I might want to use an existing filesystem, under others I might want to commit a blockdev. I've even been asked about using battery-backed RAM - which has different design constraints. (3) The constraint has been imposed by the NFS team that the cache be completely asynchronous. I haven't quite met this: readpages() will wait until the cache knows whether or not the pages are available on the principle that read operations done through the cache can be considered synchronous. This is an attempt to reduce the context switchage involved. Unfortunately, the asynchronicity requirement has caused the middle layer to bloat. Fortunately, the backing cache needn't bloat as it can use the middle layer's bloat. That said, I also see you have put a huge amount of work into this over the years, it is nicely broken out, you are responsive and easy to work with, all arguments for an early merge. Against that, you invade core kernel for reasons that are not necessarily justified: * two new page flags I need to keep track of two bits of per-cached-page information: (1) This page is known by the cache, and that the cache must be informed if the page is going to go away. (2) This page is being written to disk by the cache, and that it cannot be released until completion. Ideally
Re: [PATCH 09/37] Security: Allow kernel services to override LSM settings for task actions
Casey Schaufler [EMAIL PROTECTED] wrote: +static int smack_task_kernel_act_as(struct task_struct *p, + struct task_security *sec, u32 secid) +{ + return -ENOTSUPP; +} ... +static int smack_task_create_files_as(struct task_struct *p, + struct task_security *sec, + struct inode *inode) +{ + return -ENOTSUPP; +} Hum. ENOTSUPP is not not very satisfying, is it? I will have to think on this a bit. Sorry, I meant to ping you on this directly. I'm not sure how to effect these two functions for Smack. Except for the fact that the hooks don't do anything this looks fine. I'm not sure that I would want these hooks to do anything, it requires additional thought to determine if there is a good behavior for them. Note that you won't be able to use CacheFiles with Smack if either of these just returns an error. This may also affect NFSd in the future too. smack_task_create_files_as() is passed the label that new files created by CacheFiles should be created with. For smack_task_kernel_act_as(), it may be sufficient to set CAP_MAC_OVERRIDE in the task_security struct and leave it as that. It also may not be sufficient, as NFSd may end up using this to set the subjective security label supplied by the NFS client. I don't know, though, whether Smack is going to be involved in that passing labels over NFS. David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/37] Permit filesystem local caching
Daniel Phillips [EMAIL PROTECTED] wrote: The way the client works is like this: Thanks for the excellent ascii art, that cleared up the confusion right away. You know what they say about pictures... :-) What are you trying to do exactly? Are you actually playing with it, or just looking at the numbers I've produced? Trying to see if you are offering enough of a win to justify testing it, and if that works out, then going shopping for a bin of rotten vegetables to throw at your design, which I hope you will perceive as useful. One thing that you have to remember: my test setup is pretty much the worst-case for being appropriate for showing the need for caching to improve performance. There's a single client and a single server, they've got GigE networking between them that has very little other load, and the server has sufficient memory to hold the entire test data set. From the numbers you have posted I think you are missing some basic efficiencies that could take this design from the sorta-ok zone to wow! Not really, it's just that this lashup could be considered designed to show local caching in the worst light. But looking up the object in the cache should be nearly free - much less than a microsecond per block. The problem is that you have to do a database lookup of some sort, possibly involving several synchronous disk operations. CacheFiles does a disk lookup by taking the key given to it by NFS, turning it into a set of file or directory names, and doing a short pathwalk to the target cache file. Throwing in extra indices won't necessarily help. What matters is how quick the backing filesystem is at doing lookups. As it turns out, Ext3 is a fair bit better then BTRFS when the disk cache is cold. The metadata problem is quite a tricky one since it increases with the number of files you're dealing with. As things stand in my patches, when NFS, for example, wants to access a new inode, it first has to go to the server to lookup the NFS file handle, and only then can it go to the cache to find out if there's a matching object in the case. So without the persistent cache it can omit the LOOKUP and just send the filehandle as part of the READ? What 'it'? Note that the get the filehandle, you have to do a LOOKUP op. With the cache, we could actually cache the results of lookups that we've done, however, we don't know that the results are still valid without going to the server:-/ AFS has a way around that - it versions its vnode (inode) IDs. The reason my client going to my server is so quick is that the server has the dcache and the pagecache preloaded, so that across-network lookup operations are really, really quick, as compared to the synchronous slogging of the local disk to find the cache object. Doesn't that just mean you have to preload the lookup table for the persistent cache so you can determine whether you are caching the data for a filehandle without going to disk? Where lookup table == dcache. That would be good yes. cachefilesd prescans all the files in the cache, which ought to do just that, but it doesn't seem to be very effective. I'm not sure why. I can probably improve this a little by pre-loading the subindex directories (hash tables) that I use to reduce the directory size in the cache, but I don't know by how much. Ah I should have read ahead. I think the correct answer is a lot. Quite possibly. It'll allow me to dispense with at least one fs lookup call per cache object request call. Your big can-t-get-there-from-here is the round trip to the server to determine whether you should read from the local cache. Got any ideas? I'm not sure what you mean. Your statement should probably read ... to determine _what_ you should read from the local cache. And where is the Trond-meister in all of this? Keeping quiet as far as I can tell. David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/37] Permit filesystem local caching
Chris Mason [EMAIL PROTECTED] wrote: The interesting case is where the disk cache is warm, but the pagecache is cold (ie: just after a reboot after filling the caches). Here, for the two big files case, BTRFS appears quite a bit better than Ext3, showing a 21% reduction in time for the smaller case and a 13% reduction for the larger case. I'm afraid I don't have a good handle on the filesystem operations that result from this workload. Are we reading from the FS to fill the NFS page cache? I'm not sure what you're asking. When the cache is cold, we determine that we can't read from the cache very quickly. We then read data from the server and, in the background, create the metadata in the cache and store the data to it (by copying netfs pages to backingfs pages). When the cache is warm, we read the data from the cache, copying the data from the backingfs pages to the netfs pages. We use bmap() to ascertain that there is data to be read, otherwise we detect a hole and fallback to reading from the server. Looking up cache object involves a sequence of lookup() ops and getxattr() ops on the backingfs. Should an object not exist, we defer creation of that object to a background thread and do lookups(), mkdirs() and setxattrs() and a create() to manufacture the object. We read data from an object by calling readpages() on the backingfs to bring the data into the pagecache. We monitor the PG_lock bits to find out when each page is read or has completed with an error. Writing pages to the cache is done completely in the background. PG_fscache_write is set on a page when it is handed to fscache to storage, then at some point a background thread wakes up and calls write_one_page() in the backingfs to write that page to the cache file. At the moment, this copies the data into a backingfs page which is then marked PG_dirty, and the VM writes it out in the usual way. More surprising is that BTRFS performed significantly worse (15% increase in time) in the case where the cache on disk was fully populated and then the machine had been rebooted to clear the pagecaches. Which FS operations are included here? Finding all the files or just an unmount? Btrfs defrags metadata in the background, and unmount has to wait for that defrag to finish. BTRFS might not be doing any writing at all here - apart from local atimes (used by cache culling), that is. What it does have to do is lots of lookups, reads and getxattrs, all of which are synchronous. David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/37] Permit filesystem local caching
David Howells [EMAIL PROTECTED] wrote: Have you got before/after benchmark results? See attached. Attached here are results using BTRFS (patched so that it'll work at all) rather than Ext3 on the client on the partition backing the cache. And here are XFS results. Tuning XFS makes a *really* big difference for the lots of small/medium files being tarred case. However, in general BTRFS is much better. David --- = FEW BIG FILES TEST ON XFS = Completely cold caches: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m2.286s user0m0.000s sys 0m1.828s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m4.228s user0m0.000s sys 0m1.360s Warm NFS pagecache: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m0.058s user0m0.000s sys 0m0.060s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m0.122s user0m0.000s sys 0m0.120s Warm XFS pagecache, cold NFS pagecache: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m0.181s user0m0.000s sys 0m0.180s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m1.034s user0m0.000s sys 0m0.404s Warm on-disk cache, cold pagecaches: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m1.540s user0m0.000s sys 0m0.256s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m3.003s user0m0.000s sys 0m0.532s == MANY SMALL/MEDIUM FILE READING TEST ON XFS == Completely cold caches: [EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero real4m56.827s user0m0.180s sys 0m6.668s Warm NFS pagecache: [EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero real0m15.084s user0m0.212s sys 0m5.008s Warm XFS pagecache, cold NFS pagecache: [EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero real0m13.547s user0m0.220s sys 0m5.652s Warm on-disk cache, cold pagecaches: [EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero real4m36.316s user0m0.148s sys 0m4.440s === MANY SMALL/MEDIUM FILE READING TEST ON AN OPTIMISED XFS === mkfs.xfs -d agcount=4 -l size=128m,version=2 /dev/sda6 Completely cold caches: [EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero real3m44.033s user0m0.248s sys 0m6.632s Warm on-disk cache, cold pagecaches: [EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero real3m8.582s user0m0.108s sys 0m3.420s - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/37] Permit filesystem local caching
Chris Mason [EMAIL PROTECTED] wrote: Thanks for trying this, of course I'll ask you to try again with the latest v0.13 code, it has a number of optimizations especially for CPU usage. Here you go. The numbers are very similar. David = FEW BIG FILES TEST ON BTRFS v0.13 = Completely cold caches: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m2.202s user0m0.000s sys 0m1.716s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m4.212s user0m0.000s sys 0m0.896s Warm BTRFS pagecache, cold NFS pagecache: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m0.197s user0m0.000s sys 0m0.192s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m0.376s user0m0.000s sys 0m0.372s Warm on-disk cache, cold pagecaches: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m1.543s user0m0.004s sys 0m1.448s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m3.111s user0m0.000s sys 0m2.856s == MANY SMALL/MEDIUM FILE READING TEST ON BTRFS v0.13 == Completely cold caches: [EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero real0m31.575s user0m0.176s sys 0m6.316s Warm BTRFS pagecache, cold NFS pagecache: [EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero real0m16.081s user0m0.164s sys 0m5.528s Warm on-disk cache, cold pagecaches: [EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero real2m15.245s user0m0.064s sys 0m2.808s - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/37] Permit filesystem local caching
Daniel Phillips [EMAIL PROTECTED] wrote: I am eventually going to suggest cutting the backing filesystem entirely out of the picture, You still need a database to manage the cache. A filesystem such as Ext3 makes a very handy database for four reasons: (1) It exists and works. (2) It has a well defined interface within the kernel. (3) I can place my cache on, say, my root partition on my laptop. I don't have to dedicate a partition to the cache. (4) Userspace cache management tools (such as cachefilesd) have an already existing interface to use: rmdir, unlink, open, getdents, etc.. I do have a cache-on-blockdev thing, but it's basically a wandering tree filesystem inside. It is, or was, much faster than ext3 on a clean cache, but it degrades horribly over time because my free space reclamation sucks - it gradually randomises the block allocation sequence over time. So, what would you suggest instead of a backing filesystem? I really do not like idea of force fitting this cache into a generic vfs model. Sun was collectively smoking some serious crack when they cooked that one up. But there is also the ageless principle isness is more important than niceness. What do you mean? I'm not doing it like Sun. The cache is a side path from the netfs. It should be transparent to the user, the VFS and the server. The only place it might not be transparent is that you might to have to instruct the netfs mount to use the cache. I'd prefer to do it some other way than passing parameters to mount, though, as (1) this causes fun with NIS distributed automounter maps, and (2) people are asking for a finer grain of control than per-mountpoint. Unfortunately, I can't seem to find a way to do it that's acceptable to Al. Which would require a change to NFS, not an option because you hope to work with standard servers? Of course with years to think about this, the required protocol changes were put into v4. Not. I don't think there's much I can do about NFS. It requires the filesystem from which the NFS server is dealing to have inode uniquifiers, which are then incorporated into the file handle. I don't think the NFS protocol itself needs to change to support this. Have you completely exhausted optimization ideas for the file handle lookup? No, but there aren't many. CacheFiles doesn't actually do very much, and it's hard to reduce that not very much. The most obvious thing is to prepopulate the dcache, but that's at the expense of memory usage. Actually, if I cache the name = FH mapping I used last time, I can make a start on looking up in the cache whilst simultaneously accessing the server. If what's on the server has changed, I can ditch the speculative cache lookup I was making and start a new cache lookup. However, storing directory entries has penalties of its own, though it'll be necesary if we want to do disconnected operation. Where lookup table == dcache. That would be good yes. cachefilesd prescans all the files in the cache, which ought to do just that, but it doesn't seem to be very effective. I'm not sure why. RCU? Anyway, it is something to be tracked down and put right. cachefilesd runs in userspace. It's possible it isn't doing enough to preload all the metadata. What I tried to say. So still... got any ideas? That extra synchronous network round trip is a killer. Can it be made streaming/async to keep throughput healthy? That's a per-netfs thing. With the test rig I've got, it's going to the on-disk cache that's the killer. Going over the network is much faster. See the results I posted. For the tarball load, and using Ext3 to back the cache: Cold NFS cache, no disk cache: 0m22.734s Warm on-disk cache, cold pagecaches:1m54.350s The problem is reading using tar is a worst case workload for this. Everything it does is pretty much completely synchronous. One thing that might help is if things like tar and find can be made to use fadvise() on directories to hint to the filesystem (NFS, AFS, whatever) that it's going to access every file in those directories. Certainly AFS could make use of that: the directory is read as a file, and the netfs then parses the file to get a list of vnode IDs that that directory points to. It could then do bulk status fetch operations to instantiate the inodes 50 at a time. I don't know whether NFS could use it. Someone like Trond or SteveD or Chuck would have to answer that. David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/37] Permit filesystem local caching
Daniel Phillips [EMAIL PROTECTED] wrote: These patches add local caching for network filesystems such as NFS. Have you got before/after benchmark results? I need to get a new hard drive for my test machine before I can go and get some more up to date benchmark results. It does seem, however, that the I/O error handling capabilities of FS-Cache work properly:-) David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/37] Permit filesystem local caching
Daniel Phillips [EMAIL PROTECTED] wrote: Have you got before/after benchmark results? See attached. These show a couple of things: (1) Dealing with lots of metadata slows things down a lot. Note the result of looking and reading lots of small files with tar (the last result). The NFS client has to both consult the NFS server *and* the cache. Not only that, but any asynchronicity the cache may like to do is rendered ineffective by the fact tar wants to do a read on a file pretty much directly after opening it. (2) Getting metadata from the local disk fs is slower than pulling it across an unshared gigabit ethernet from a server that already has it in memory. These points don't mean that fscache is no use, just that you have to consider carefully whether it's of use to *you* given your particular situation, and that depends on various factors. Note that currently FS-Caching is disabled for individual NFS files opened for writing as there's no way to handle the coherency problems thereby introduced. David --- === FS-CACHE FOR NFS BENCHMARKS === (*) The NFS client has a 1.86GHz Core2 Duo CPU and 1GB of RAM. (*) The NFS client has a Seagate ST380211AS 80GB 7200rpm SATA disk on an interface running in AHCI mode. The chipset is an Intel G965. (*) A partition of approx 4.5GB is committed to caching, and is formatted as Ext3 with a blocksize of 4096 and directory indices. (*) The NFS client is using SELinux. (*) The NFS server is running an in-kernel NFSd, and has a 2.66GHz Core2 Duo CPU and 6GB of RAM. The chipset is an Intel P965. (*) The NFS client is connected to the NFS server by Gigabit Ethernet. (*) The NFS mount is made with defaults for all options not relating to the cache: warthog:/warthog /warthog nfs rw,vers=3,rsize=1048576,wsize=1048576,hard,proto=tcp,timeo=600, retrans=2,sec=sys,fsc,addr=w.x.y.z 0 0 == FEW BIG FILES TEST == Where: (*) The NFS server has two files: [EMAIL PROTECTED] ~]# ls -l /warthog/bigfile -rw-rw-r-- 1 4043 4043 104857600 2006-11-30 09:39 /warthog/bigfile [EMAIL PROTECTED] ~]# ls -l /warthog/biggerfile -rw-rw-r-- 1 4043 4041 209715200 2006-03-21 13:56 /warthog/biggerfile Both of which are in memory on the server in all cases. No patches, cold NFS cache: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m1.909s user0m0.000s sys 0m0.520s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m3.750s user0m0.000s sys 0m0.904s CONFIG_FSCACHE=n, cold NFS cache: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m2.003s user0m0.000s sys 0m0.124s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m4.100s user0m0.004s sys 0m0.488s Cold NFS cache, no disk cache: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m2.084s user0m0.000s sys 0m0.136s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m4.020s user0m0.000s sys 0m0.720s Completely cold caches: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m2.412s user0m0.000s sys 0m0.892s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m4.449s user0m0.000s sys 0m2.300s Warm NFS pagecache: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m0.067s user0m0.000s sys 0m0.064s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m0.133s user0m0.000s sys 0m0.136s Warm Ext3 pagecache, cold NFS pagecache: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m0.173s user0m0.000s sys 0m0.172s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m0.316s user0m0.000s sys 0m0.316s Warm on-disk cache, cold pagecaches: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m1.955s user0m0.000s sys 0m0.244s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m3.596s user0m0.000s sys 0m0.460s === MANY SMALL/MEDIUM FILE READING TEST === Where: (*) The NFS server has an old kernel tree: [EMAIL PROTECTED] ~]# du -s /warthog/aaa 347340 /warthog/aaa [EMAIL
Re: [PATCH 00/37] Permit filesystem local caching
David Howells [EMAIL PROTECTED] wrote: Have you got before/after benchmark results? See attached. Attached here are results using BTRFS (patched so that it'll work at all) rather than Ext3 on the client on the partition backing the cache. Note that I didn't bother redoing the tests that didn't involve a cache as the choice of filesystem backing the cache should have no bearing on the result. Generally, completely cold caches shouldn't show much variation as all the writing can be done completely asynchronously, provided the client doesn't fill its RAM. The interesting case is where the disk cache is warm, but the pagecache is cold (ie: just after a reboot after filling the caches). Here, for the two big files case, BTRFS appears quite a bit better than Ext3, showing a 21% reduction in time for the smaller case and a 13% reduction for the larger case. For the many small/medium files case, BTRFS performed significantly better (15% reduction in time) in the case where the caches were completely cold. I'm not sure why, though - perhaps because it doesn't execute a write_begin() stage during the write_one_page() call and thus doesn't go allocating disk blocks to back the data, but instead allocates them later. More surprising is that BTRFS performed significantly worse (15% increase in time) in the case where the cache on disk was fully populated and then the machine had been rebooted to clear the pagecaches. It's important to note that I've only run each test once apiece, so the numbers should be taken with a modicum of salt (bad statistics and all that). David --- === FEW BIG FILES TEST ON BTRFS === Completely cold caches: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m2.124s user0m0.000s sys 0m1.260s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m4.538s user0m0.000s sys 0m2.624s Warm NFS pagecache: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m0.061s user0m0.000s sys 0m0.064s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m0.118s user0m0.000s sys 0m0.116s Warm BTRFS pagecache, cold NFS pagecache: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m0.189s user0m0.000s sys 0m0.188s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m0.369s user0m0.000s sys 0m0.368s Warm on-disk cache, cold pagecaches: [EMAIL PROTECTED] ~]# time cat /warthog/bigfile /dev/null real0m1.540s user0m0.000s sys 0m1.440s [EMAIL PROTECTED] ~]# time cat /warthog/biggerfile /dev/null real0m3.132s user0m0.000s sys 0m1.724s MANY SMALL/MEDIUM FILE READING TEST ON BTRFS Completely cold caches: [EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero real0m31.838s user0m0.192s sys 0m6.076s Warm NFS pagecache: [EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero real0m14.841s user0m0.148s sys 0m4.988s Warm BTRFS pagecache, cold NFS pagecache: [EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero real0m16.773s user0m0.148s sys 0m5.512s Warm on-disk cache, cold pagecaches: [EMAIL PROTECTED] ~]# time tar cf - /warthog/aaa /dev/zero real2m12.527s user0m0.080s sys 0m2.908s - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/37] Permit filesystem local caching
Daniel Phillips [EMAIL PROTECTED] wrote: When you say Ext3 cache vs NFS cache is the first on the server and the second on the client? The filesystem on the server is pretty much irrelevant as long as (a) it doesn't change, and (b) all the data is in memory on the server anyway. The way the client works is like this: +-+ | | | NFS |--+ | | | +-+ | +--+ | | | +-+ +--| | | | | | | AFS |-| FS-Cache | | | | |--+ +-+ +--| | | | | | | +--+ +--+ +-+ | +--+ | | | | | | | | +--| CacheFiles |--| Ext3| | ISOFS |--+ | /var/cache | | /dev/sda6 | | |+--+ +--+ +-+ (1) NFS, say, asks FS-Cache to store/retrieve data for it; (2) FS-Cache asks the cache backend, in this case CacheFiles to honour the operation; (3) CacheFiles 'opens' a file in a mounted filesystem, say Ext3, and does read and write operations of a sort on it; (4) Ext3 decides how the cache data is laid out on disk - CacheFiles just attempts to use one sparse file per netfs inode. I am trying to spot the numbers that show the sweet spot for this optimization, without much success so far. What are you trying to do exactly? Are you actually playing with it, or just looking at the numbers I've produced? Who is supposed to win big? Is this mainly about reducing the load on the server, or is the client supposed to win even with a lightly loaded server? These are difficult questions to answer. The obvious answer to both is it depends, and the real answer to both is it's a compromise. Inserting a cache adds overhead: you have to look in the cache to see if your objects are mirrored there, and then you have to look in the cache to see if the data you want is stored there; and then you might have to go to the server anyway and then schedule a copy to be stored in the cache. The characteristics of this type of cache depend on a number of things: the filesystem backing it being the most obvious variable, but also how fragmented it is and the properties of the disk drive or drives it is on. Whether it's worth having a cache depend on the characteristics of the network versus the characteristics of the cache. Latency of the cache vs latency of the network, for example. Network loading is another: having a cache on each of several clients sharing a server can reduce network traffic by avoiding the read requests to the server. NFS has a characteristic that it keeps spamming the server with file status requests, so even if you take the read requests out of the load, an NFS client still generates quite a lot of network traffic to the server - but the reduction is still useful. The metadata problem is quite a tricky one since it increases with the number of files you're dealing with. As things stand in my patches, when NFS, for example, wants to access a new inode, it first has to go to the server to lookup the NFS file handle, and only then can it go to the cache to find out if there's a matching object in the case. Worse, the cache must then perform several synchronous disk bound metadata operations before it can be possible to read from the cache. Worse still, this means that a read on the network file cannot proceed until (a) we've been to the server *plus* (b) we've been to the disk. The reason my client going to my server is so quick is that the server has the dcache and the pagecache preloaded, so that across-network lookup operations are really, really quick, as compared to the synchronous slogging of the local disk to find the cache object. I can probably improve this a little by pre-loading the subindex directories (hash tables) that I use to reduce the directory size in the cache, but I don't know by how much. Anyway, to answer your questions: (1) It may help with heavily loaded networks with lots of read-only traffic. (2) It may help with slow connections (like doing NFS between the UK and Australia). (3) It could be used to do offline/disconnected operation. David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/37] Permit filesystem local caching
These patches add local caching for network filesystems such as NFS. The patches can roughly be broken down into a number of sets: (*) 01-keys-inc-payload.diff (*) 02-keys-search-keyring.diff (*) 03-keys-callout-blob.diff Three patches to the keyring code made to help the CIFS people. Included because of patches 05-08. (*) 04-keys-get-label.diff A patch to allow the security label of a key to be retrieved. Included because of patches 05-08. (*) 05-security-current-fsugid.diff (*) 06-security-separate-task-bits.diff (*) 07-security-subjective.diff (*) 08-security-kernel_service-class.diff (*) 09-security-kernel-service.diff (*) 10-security-nfsd.diff Patches to permit the subjective security of a task to be overridden. All the security details in task_struct are decanted into a new struct that task_struct then has two pointers two: one that defines the objective security of that task (how other tasks may affect it) and one that defines the subjective security (how it may affect other objects). Note that I have dropped the idea of struct cred for the moment. With the amount of stuff that was excluded from it, it wasn't actually any use to me. However, it can be added later. Required for cachefiles. (*) 11-release-page.diff (*) 12-fscache-page-flags.diff (*) 13-add_wait_queue_tail.diff (*) 14-fscache.diff Patches to provide a local caching facility for network filesystems. (*) 15-cachefiles-ia64.diff (*) 16-cachefiles-ext3-f_mapping.diff (*) 17-cachefiles-write.diff (*) 18-cachefiles-monitor.diff (*) 19-cachefiles-export.diff (*) 20-cachefiles.diff Patches to provide a local cache in a directory of an already mounted filesystem. (*) 21-nfs-comment.diff (*) 22-nfs-fscache-option.diff (*) 23-nfs-fscache-kconfig.diff (*) 24-nfs-fscache-top-index.diff (*) 25-nfs-fscache-server-obj.diff (*) 26-nfs-fscache-super-obj.diff (*) 27-nfs-fscache-inode-obj.diff (*) 28-nfs-fscache-use-inode.diff (*) 29-nfs-fscache-invalidate-pages.diff (*) 30-nfs-fscache-iostats.diff (*) 31-nfs-fscache-page-management.diff (*) 32-nfs-fscache-read-context.diff (*) 33-nfs-fscache-read-fallback.diff (*) 34-nfs-fscache-read-from-cache.diff (*) 35-nfs-fscache-store-to-cache.diff (*) 36-nfs-fscache-mount.diff (*) 37-nfs-fscache-display.diff Patches to provide NFS with local caching. A couple of questions on the NFS iostat changes: (1) Should I update the iostat version number; (2) is it permitted to have conditional iostats? I've brought the patchset up to date with respect to the 2.6.25-rc1 merge window, in particular altering Smack to handle the split in objective and subjective security in the task_struct. -- A tarball of the patches is available at: http://people.redhat.com/~dhowells/fscache/patches/nfs+fscache-30.tar.bz2 To use this version of CacheFiles, the cachefilesd-0.9 is also required. It is available as an SRPM: http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9-1.fc7.src.rpm Or as individual bits: http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9.tar.bz2 http://people.redhat.com/~dhowells/fscache/cachefilesd.fc http://people.redhat.com/~dhowells/fscache/cachefilesd.if http://people.redhat.com/~dhowells/fscache/cachefilesd.te http://people.redhat.com/~dhowells/fscache/cachefilesd.spec The .fc, .if and .te files are for manipulating SELinux. David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/37] KEYS: Allow the callout data to be passed as a blob rather than a string
Allow the callout data to be passed as a blob rather than a string for internal kernel services that call any request_key_*() interface other than request_key(). request_key() itself still takes a NUL-terminated string. The functions that change are: request_key_with_auxdata() request_key_async() request_key_async_with_auxdata() Signed-off-by: David Howells [EMAIL PROTECTED] --- Documentation/keys-request-key.txt | 11 +--- Documentation/keys.txt | 14 +++--- include/linux/key.h|9 --- security/keys/internal.h |9 --- security/keys/keyctl.c |7 - security/keys/request_key.c| 49 ++-- security/keys/request_key_auth.c | 12 + 7 files changed, 70 insertions(+), 41 deletions(-) diff --git a/Documentation/keys-request-key.txt b/Documentation/keys-request-key.txt index 266955d..09b55e4 100644 --- a/Documentation/keys-request-key.txt +++ b/Documentation/keys-request-key.txt @@ -11,26 +11,29 @@ request_key*(): struct key *request_key(const struct key_type *type, const char *description, - const char *callout_string); + const char *callout_info); or: struct key *request_key_with_auxdata(const struct key_type *type, const char *description, -const char *callout_string, +const char *callout_info, +size_t callout_len, void *aux); or: struct key *request_key_async(const struct key_type *type, const char *description, - const char *callout_string); + const char *callout_info, + size_t callout_len); or: struct key *request_key_async_with_auxdata(const struct key_type *type, const char *description, - const char *callout_string, + const char *callout_info, + size_t callout_len, void *aux); Or by userspace invoking the request_key system call: diff --git a/Documentation/keys.txt b/Documentation/keys.txt index 51652d3..b82d38d 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt @@ -771,7 +771,7 @@ payload contents for more information. struct key *request_key(const struct key_type *type, const char *description, - const char *callout_string); + const char *callout_info); This is used to request a key or keyring with a description that matches the description specified according to the key type's match function. This @@ -793,24 +793,28 @@ payload contents for more information. struct key *request_key_with_auxdata(const struct key_type *type, const char *description, -const char *callout_string, +const void *callout_info, +size_t callout_len, void *aux); This is identical to request_key(), except that the auxiliary data is -passed to the key_type-request_key() op if it exists. +passed to the key_type-request_key() op if it exists, and the callout_info +is a blob of length callout_len, if given (the length may be 0). (*) A key can be requested asynchronously by calling one of: struct key *request_key_async(const struct key_type *type, const char *description, - const char *callout_string); + const void *callout_info, + size_t callout_len); or: struct key *request_key_async_with_auxdata(const struct key_type *type, const char *description, - const char *callout_string, + const char *callout_info, + size_t callout_len, void *aux); which are asynchronous equivalents of request_key() and diff --git a/include/linux/key.h b/include/linux/key.h index a70b8a8..163f864 100644 --- a/include/linux/key.h +++ b/include/linux
[PATCH 13/37] FS-Cache: Provide an add_wait_queue_tail() function
Provide an add_wait_queue_tail() function to add a waiter to the back of a wait queue instead of the front. Signed-off-by: David Howells [EMAIL PROTECTED] --- include/linux/pagemap.h |7 +-- include/linux/wait.h|1 + kernel/wait.c | 18 ++ mm/filemap.c|2 +- 4 files changed, 25 insertions(+), 3 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index c5df3ae..ad9484f 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -225,8 +225,11 @@ static inline void wait_on_page_writeback(struct page *page) extern void end_page_writeback(struct page *page); -/* - * Wait for a PG_owner_priv_2 to become clear +/** + * wait_on_page_owner_priv_2 - Wait for PG_owner_priv_2 to become clear + * @page: The page to monitor + * + * Wait for a PG_owner_priv_2 to become clear on the specified page. */ static inline void wait_on_page_owner_priv_2(struct page *page) { diff --git a/include/linux/wait.h b/include/linux/wait.h index 0081147..a6a6607 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -118,6 +118,7 @@ static inline int waitqueue_active(wait_queue_head_t *q) #define is_sync_wait(wait) (!(wait) || ((wait)-private)) extern void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait); +extern void add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t *wait); extern void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait); extern void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait); diff --git a/kernel/wait.c b/kernel/wait.c index c275c56..191df0d 100644 --- a/kernel/wait.c +++ b/kernel/wait.c @@ -29,6 +29,24 @@ void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait) } EXPORT_SYMBOL(add_wait_queue); +/** + * add_wait_queue_tail - Add a waiter to the back of a waitqueue + * @q: the wait queue to append the waiter to + * @wait: the waiter to be queued + * + * Add a waiter to the back of a waitqueue so that it gets woken up last. + */ +void add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t *wait) +{ + unsigned long flags; + + wait-flags = ~WQ_FLAG_EXCLUSIVE; + spin_lock_irqsave(q-lock, flags); + __add_wait_queue_tail(q, wait); + spin_unlock_irqrestore(q-lock, flags); +} +EXPORT_SYMBOL(add_wait_queue_tail); + void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait) { unsigned long flags; diff --git a/mm/filemap.c b/mm/filemap.c index 8951d67..b72e112 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -587,7 +587,7 @@ void end_page_writeback(struct page *page) EXPORT_SYMBOL(end_page_writeback); /** - * end_page_own - Clear PG_owner_priv_2 and wake up any waiters + * end_page_owner_priv_2 - Clear PG_owner_priv_2 and wake up any waiters * @page: the page * * Clear PG_owner_priv_2 and wake up any processes waiting for that event. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 24/37] NFS: Register NFS for caching and retrieve the top-level index
Register NFS for caching and retrieve the top-level cache index object cookie. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/Makefile|1 + fs/nfs/fscache-index.c | 53 fs/nfs/fscache.h | 35 fs/nfs/inode.c |8 +++ 4 files changed, 97 insertions(+), 0 deletions(-) create mode 100644 fs/nfs/fscache-index.c create mode 100644 fs/nfs/fscache.h diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile index df0f41e..6d7176d 100644 --- a/fs/nfs/Makefile +++ b/fs/nfs/Makefile @@ -16,3 +16,4 @@ nfs-$(CONFIG_NFS_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o \ nfs4namespace.o nfs-$(CONFIG_NFS_DIRECTIO) += direct.o nfs-$(CONFIG_SYSCTL) += sysctl.o +nfs-$(CONFIG_NFS_FSCACHE) += fscache-index.o diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c new file mode 100644 index 000..225ed5d --- /dev/null +++ b/fs/nfs/fscache-index.c @@ -0,0 +1,53 @@ +/* NFS FS-Cache index structure definition + * + * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved. + * Written by David Howells ([EMAIL PROTECTED]) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +#include linux/init.h +#include linux/kernel.h +#include linux/sched.h +#include linux/mm.h +#include linux/nfs_fs.h +#include linux/nfs_fs_sb.h +#include linux/in6.h + +#include internal.h +#include fscache.h + +#define NFSDBG_FACILITYNFSDBG_FSCACHE + +static const struct fscache_netfs_operations nfs_cache_ops = { +}; + +/* + * Define the NFS filesystem for FS-Cache. Upon registration FS-Cache sticks + * the cookie for the top-level index object for NFS into this structure. The + * top-level index can than have other cache objects inserted into it. + */ +struct fscache_netfs nfs_cache_netfs = { + .name = nfs, + .version= 0, + .ops= nfs_cache_ops, +}; + +/* + * Register NFS for caching + */ +int nfs_fscache_register(void) +{ + return fscache_register_netfs(nfs_cache_netfs); +} + +/* + * Unregister NFS for caching + */ +void nfs_fscache_unregister(void) +{ + fscache_unregister_netfs(nfs_cache_netfs); +} diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h new file mode 100644 index 000..75e5a03 --- /dev/null +++ b/fs/nfs/fscache.h @@ -0,0 +1,35 @@ +/* NFS filesystem cache interface definitions + * + * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved. + * Written by David Howells ([EMAIL PROTECTED]) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +#ifndef _NFS_FSCACHE_H +#define _NFS_FSCACHE_H + +#include linux/nfs_fs.h +#include linux/nfs_mount.h +#include linux/nfs4_mount.h + +#ifdef CONFIG_NFS_FSCACHE +#include linux/fscache.h + +/* + * fscache-index.c + */ +extern struct fscache_netfs nfs_cache_netfs; + +extern int nfs_fscache_register(void); +extern void nfs_fscache_unregister(void); + +#else /* CONFIG_NFS_FSCACHE */ +static inline int nfs_fscache_register(void) { return 0; } +static inline void nfs_fscache_unregister(void) {} + +#endif /* CONFIG_NFS_FSCACHE */ +#endif /* _NFS_FSCACHE_H */ diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 966a885..7254d5c 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -46,6 +46,7 @@ #include delegation.h #include iostat.h #include internal.h +#include fscache.h #define NFSDBG_FACILITYNFSDBG_VFS @@ -1222,6 +1223,10 @@ static int __init init_nfs_fs(void) { int err; + err = nfs_fscache_register(); + if (err 0) + goto out6; + err = nfs_fs_proc_init(); if (err) goto out5; @@ -1268,6 +1273,8 @@ out3: out4: nfs_fs_proc_exit(); out5: + nfs_fscache_unregister(); +out6: return err; } @@ -1278,6 +1285,7 @@ static void __exit exit_nfs_fs(void) nfs_destroy_readpagecache(); nfs_destroy_inodecache(); nfs_destroy_nfspagecache(); + nfs_fscache_unregister(); #ifdef CONFIG_PROC_FS rpc_proc_unregister(nfs); #endif - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/37] Security: Allow kernel services to override LSM settings for task actions
Allow kernel services to override LSM settings appropriate to the actions performed by a task by duplicating a security record, modifying it and then using task_struct::act_as to point to it when performing operations on behalf of a task. This is used, for example, by CacheFiles which has to transparently access the cache on behalf of a process that thinks it is doing, say, NFS accesses with a potentially inappropriate (with respect to accessing the cache) set of security data. This patch provides two LSM hooks for modifying a task security record: (*) security_kernel_act_as() which allows modification of the security datum with which a task acts on other objects (most notably files). (*) security_create_files_as() which allows modification of the security datum that is used to initialise the security data on a file that a task creates. Signed-off-by: David Howells [EMAIL PROTECTED] --- include/linux/capability.h | 12 ++-- include/linux/cred.h| 23 +++ include/linux/security.h| 43 + kernel/cred.c | 112 +++ security/dummy.c| 17 + security/security.c | 15 - security/selinux/hooks.c| 51 security/selinux/include/security.h |2 - security/selinux/ss/services.c |5 +- security/smack/smack_lsm.c | 32 ++ 10 files changed, 297 insertions(+), 15 deletions(-) create mode 100644 include/linux/cred.h diff --git a/include/linux/capability.h b/include/linux/capability.h index 7d50ff6..424de01 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -364,12 +364,12 @@ typedef struct kernel_cap_struct { # error Fix up hand-coded capability macro initializers #else /* HAND-CODED capability initializers */ -# define CAP_EMPTY_SET{{ 0, 0 }} -# define CAP_FULL_SET {{ ~0, ~0 }} -# define CAP_INIT_EFF_SET {{ ~CAP_TO_MASK(CAP_SETPCAP), ~0 }} -# define CAP_FS_SET {{ CAP_FS_MASK_B0, CAP_FS_MASK_B1 } } -# define CAP_NFSD_SET {{ CAP_FS_MASK_B0|CAP_TO_MASK(CAP_SYS_RESOURCE), \ -CAP_FS_MASK_B1 } } +# define CAP_EMPTY_SET((kernel_cap_t){{ 0, 0 }}) +# define CAP_FULL_SET ((kernel_cap_t){{ ~0, ~0 }}) +# define CAP_INIT_EFF_SET ((kernel_cap_t){{ ~CAP_TO_MASK(CAP_SETPCAP), ~0 }}) +# define CAP_FS_SET ((kernel_cap_t){{ CAP_FS_MASK_B0, CAP_FS_MASK_B1 } }) +# define CAP_NFSD_SET ((kernel_cap_t){{ CAP_FS_MASK_B0|CAP_TO_MASK(CAP_SYS_RESOURCE), \ + CAP_FS_MASK_B1 } }) #endif /* _LINUX_CAPABILITY_U32S != 2 */ diff --git a/include/linux/cred.h b/include/linux/cred.h new file mode 100644 index 000..497af5b --- /dev/null +++ b/include/linux/cred.h @@ -0,0 +1,23 @@ +/* Credential management + * + * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. + * Written by David Howells ([EMAIL PROTECTED]) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +#ifndef _LINUX_CRED_H +#define _LINUX_CRED_H + +struct task_security; +struct inode; + +extern struct task_security *get_kernel_security(struct task_struct *); +extern int set_security_override(struct task_security *, u32); +extern int set_security_override_from_ctx(struct task_security *, const char *); +extern int change_create_files_as(struct task_security *, struct inode *); + +#endif /* _LINUX_CRED_H */ diff --git a/include/linux/security.h b/include/linux/security.h index 9bf93c7..1c17b91 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -568,6 +568,19 @@ struct request_sock; * Duplicate and attach the security structure currently attached to the * p-security field. * Return 0 if operation was successful. + * @task_kernel_act_as: + * Set the credentials for a kernel service to act as (subjective context). + * @p points to the task that nominated @secid. + * @sec points to the task security record to be modified. + * @secid specifies the security ID to be set + * Return 0 if successful. + * @task_create_files_as: + * Set the file creation context in a task security record to be the same + * as the objective context of the specified inode. + * @p points to the task that nominated @inode. + * @sec points to the task security record to be modified. + * @inode points to the inode to use as a reference. + * Return 0 if successful. * @task_setuid: * Check permission before setting one or more of the user identity * attributes of the current process. The @flags parameter indicates @@ -1342,6 +1355,11 @@ struct security_operations { int (*task_alloc_security) (struct task_struct *p); void
[PATCH 05/37] Security: Change current-fs[ug]id to current_fs[ug]id()
Change current-fs[ug]id to current_fs[ug]id() so that fsgid and fsuid can be separated from the task_struct. Signed-off-by: David Howells [EMAIL PROTECTED] --- arch/ia64/kernel/perfmon.c|4 ++-- arch/powerpc/platforms/cell/spufs/inode.c |4 ++-- drivers/isdn/capi/capifs.c|4 ++-- drivers/usb/core/inode.c |4 ++-- fs/9p/fid.c |2 +- fs/9p/vfs_inode.c |4 ++-- fs/9p/vfs_super.c |4 ++-- fs/affs/inode.c |4 ++-- fs/anon_inodes.c |4 ++-- fs/attr.c |4 ++-- fs/bfs/dir.c |4 ++-- fs/cifs/cifsproto.h |2 +- fs/cifs/dir.c | 12 ++-- fs/cifs/inode.c |8 fs/cifs/misc.c|4 ++-- fs/coda/cache.c |6 +++--- fs/coda/upcall.c |4 ++-- fs/devpts/inode.c |4 ++-- fs/dquot.c|2 +- fs/exec.c |4 ++-- fs/ext2/balloc.c |2 +- fs/ext2/ialloc.c |4 ++-- fs/ext2/ioctl.c |2 +- fs/ext3/balloc.c |2 +- fs/ext3/ialloc.c |4 ++-- fs/ext4/balloc.c |2 +- fs/ext4/ialloc.c |4 ++-- fs/fuse/dev.c |4 ++-- fs/gfs2/inode.c | 10 +- fs/hfs/inode.c|4 ++-- fs/hfsplus/inode.c|4 ++-- fs/hpfs/namei.c | 24 fs/hugetlbfs/inode.c | 16 fs/jffs2/fs.c |4 ++-- fs/jfs/jfs_inode.c|4 ++-- fs/locks.c|2 +- fs/minix/bitmap.c |4 ++-- fs/namei.c|8 fs/nfsd/vfs.c |6 +++--- fs/ocfs2/dlm/dlmfs.c |8 fs/ocfs2/namei.c |4 ++-- fs/pipe.c |4 ++-- fs/posix_acl.c|4 ++-- fs/ramfs/inode.c |4 ++-- fs/reiserfs/namei.c |4 ++-- fs/sysv/ialloc.c |4 ++-- fs/udf/ialloc.c |4 ++-- fs/udf/namei.c|2 +- fs/ufs/ialloc.c |4 ++-- fs/xfs/linux-2.6/xfs_linux.h |4 ++-- fs/xfs/xfs_acl.c |6 +++--- fs/xfs/xfs_attr.c |2 +- fs/xfs/xfs_inode.c|4 ++-- fs/xfs/xfs_vnodeops.c |8 include/linux/fs.h|2 +- include/linux/sched.h |3 +++ ipc/mqueue.c |4 ++-- kernel/cgroup.c |4 ++-- mm/shmem.c|8 net/9p/client.c |2 +- net/socket.c |4 ++-- net/sunrpc/auth.c |8 security/commoncap.c |4 ++-- security/keys/key.c |2 +- security/keys/keyctl.c|2 +- security/keys/request_key.c | 10 +- security/keys/request_key_auth.c |2 +- 67 files changed, 161 insertions(+), 158 deletions(-) diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c index f6b9971..4b229f2 100644 --- a/arch/ia64/kernel/perfmon.c +++ b/arch/ia64/kernel/perfmon.c @@ -2191,8 +2191,8 @@ pfm_alloc_fd(struct file **cfile) DPRINT((new inode ino=%ld @%p\n, inode-i_ino, inode)); inode-i_mode = S_IFCHR|S_IRUGO; - inode-i_uid = current-fsuid; - inode-i_gid = current-fsgid; + inode-i_uid = current_fsuid(); + inode-i_gid = current_fsgid(); sprintf(name, [%lu], inode-i_ino); this.name = name; diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c index 6d1228c..a789ecf 100644 --- a/arch/powerpc/platforms/cell/spufs/inode.c +++ b/arch/powerpc/platforms/cell/spufs/inode.c @@ -86,8 +86,8 @@ spufs_new_inode(struct super_block *sb, int mode) goto out; inode-i_mode = mode; - inode-i_uid = current-fsuid; - inode-i_gid = current-fsgid; + inode-i_uid = current_fsuid(); + inode-i_gid
[PATCH 21/37] NFS: Add comment banners to some NFS functions
Add comment banners to some NFS functions so that they can be modified by the NFS fscache patches for further information. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/file.c | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index ef57a5a..26a073b 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -354,6 +354,13 @@ static int nfs_write_end(struct file *file, struct address_space *mapping, return copied; } +/* + * Partially or wholly invalidate a page + * - Release the private state associated with a page if undergoing complete + * page invalidation + * - Called if either PG_private or PG_private_2 is set on the page + * - Caller holds page lock + */ static void nfs_invalidate_page(struct page *page, unsigned long offset) { if (offset != 0) @@ -362,12 +369,26 @@ static void nfs_invalidate_page(struct page *page, unsigned long offset) nfs_wb_page_cancel(page-mapping-host, page); } +/* + * Attempt to release the private state associated with a page + * - Called if either PG_private or PG_private_2 is set on the page + * - Caller holds page lock + * - Return true (may release page) or false (may not) + */ static int nfs_release_page(struct page *page, gfp_t gfp) { /* If PagePrivate() is set, then the page is not freeable */ return 0; } +/* + * Attempt to clear the private state associated with a page when an error + * occurs that requires the cached contents of an inode to be written back or + * destroyed + * - Called if either PG_private or PG_private_2 is set on the page + * - Caller holds page lock + * - Return 0 if successful, -error otherwise + */ static int nfs_launder_page(struct page *page) { return nfs_wb_page(page-mapping-host, page); @@ -389,6 +410,11 @@ const struct address_space_operations nfs_file_aops = { .launder_page = nfs_launder_page, }; +/* + * Notification that a PTE pointing to an NFS page is about to be made + * writable, implying that someone is about to modify the page through a + * shared-writable mapping + */ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct page *page) { struct file *filp = vma-vm_file; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 19/37] CacheFiles: Export things for CacheFiles
Export a number of functions for CacheFiles's use. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/super.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/super.c b/fs/super.c index 88811f6..1133b43 100644 --- a/fs/super.c +++ b/fs/super.c @@ -267,6 +267,7 @@ int fsync_super(struct super_block *sb) __fsync_super(sb); return sync_blockdev(sb-s_bdev); } +EXPORT_SYMBOL_GPL(fsync_super); /** * generic_shutdown_super - common helper for -kill_sb() - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/37] KEYS: Check starting keyring as part of search
Check the starting keyring as part of the search to (a) see if that is what we're searching for, and (b) to check it is still valid for searching. The scenario: User in process A does things that cause things to be created in its process session keyring. The user then does an su to another user and starts a new process, B. The two processes now share the same process session keyring. Process B does an NFS access which results in an upcall to gssd. When gssd attempts to instantiate the context key (to be linked into the process session keyring), it is denied access even though it has an authorization key. The order of calls is: keyctl_instantiate_key() lookup_user_key() (the default: case) search_process_keyrings(current) search_process_keyrings(rka-context) (recursive call) keyring_search_aux() keyring_search_aux() verifies the keys and keyrings underneath the top-level keyring it is given, but that top-level keyring is neither fully validated nor checked to see if it is the thing being searched for. This patch changes keyring_search_aux() to: 1) do more validation on the top keyring it is given and 2) check whether that top-level keyring is the thing being searched for Signed-off-by: Kevin Coffman [EMAIL PROTECTED] Signed-off-by: David Howells [EMAIL PROTECTED] --- security/keys/keyring.c | 35 +++ 1 files changed, 31 insertions(+), 4 deletions(-) diff --git a/security/keys/keyring.c b/security/keys/keyring.c index 88292e3..76b89b2 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -292,7 +292,7 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref, struct keyring_list *keylist; struct timespec now; - unsigned long possessed; + unsigned long possessed, kflags; struct key *keyring, *key; key_ref_t key_ref; long err; @@ -318,6 +318,32 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref, now = current_kernel_time(); err = -EAGAIN; sp = 0; + + /* firstly we should check to see if this top-level keyring is what we +* are looking for */ + key_ref = ERR_PTR(-EAGAIN); + kflags = keyring-flags; + if (keyring-type == type match(keyring, description)) { + key = keyring; + + /* check it isn't negative and hasn't expired or been +* revoked */ + if (kflags (1 KEY_FLAG_REVOKED)) + goto error_2; + if (key-expiry now.tv_sec = key-expiry) + goto error_2; + key_ref = ERR_PTR(-ENOKEY); + if (kflags (1 KEY_FLAG_NEGATIVE)) + goto error_2; + goto found; + } + + /* otherwise, the top keyring must not be revoked, expired, or +* negatively instantiated if we are to search it */ + key_ref = ERR_PTR(-EAGAIN); + if (kflags ((1 KEY_FLAG_REVOKED) | (1 KEY_FLAG_NEGATIVE)) || + (keyring-expiry now.tv_sec = keyring-expiry)) + goto error_2; /* start processing a new keyring */ descend: @@ -331,13 +357,14 @@ descend: /* iterate through the keys in this keyring first */ for (kix = 0; kix keylist-nkeys; kix++) { key = keylist-keys[kix]; + kflags = key-flags; /* ignore keys not of this type */ if (key-type != type) continue; /* skip revoked keys and expired keys */ - if (test_bit(KEY_FLAG_REVOKED, key-flags)) + if (kflags (1 KEY_FLAG_REVOKED)) continue; if (key-expiry now.tv_sec = key-expiry) @@ -352,8 +379,8 @@ descend: context, KEY_SEARCH) 0) continue; - /* we set a different error code if we find a negative key */ - if (test_bit(KEY_FLAG_NEGATIVE, key-flags)) { + /* we set a different error code if we pass a negative key */ + if (kflags (1 KEY_FLAG_NEGATIVE)) { err = -ENOKEY; continue; } - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 23/37] NFS: Permit local filesystem caching to be enabled for NFS
Permit local filesystem caching to be enabled for NFS in the kernel configuration. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/Kconfig |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/fs/Kconfig b/fs/Kconfig index c42ec50..fa8e978 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -1644,6 +1644,14 @@ config NFS_V4 If unsure, say N. +config NFS_FSCACHE + bool Provide NFS client caching support (EXPERIMENTAL) + depends on EXPERIMENTAL + depends on NFS_FS=m FSCACHE || NFS_FS=y FSCACHE=y + help + Say Y here if you want NFS data to be cached locally on disc through + the general filesystem cache manager + config NFS_DIRECTIO bool Allow direct I/O on NFS files depends on NFS_FS - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 27/37] NFS: Define and create inode-level cache objects
Define and create inode-level cache data storage objects (as managed by nfs_inode structs). Each inode-level object is created in a superblock-level index object and is itself a data storage object into which pages from the inode are stored. The inode object key is the NFS file handle for the inode. The inode object is given coherency data to carry in the auxiliary data permitted by the cache. This is a sequence made up of: (1) i_mtime from the NFS inode. (2) i_ctime from the NFS inode. (3) i_size from the NFS inode. As the cache is a persistent cache, the auxiliary data is checked when a new NFS in-memory inode is set up that matches an already existing data storage object in the cache. If the coherency data is the same, the on-disk object is retained and used; if not, it is scrapped and a new one created. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/fscache-index.c | 112 fs/nfs/fscache.h |1 2 files changed, 113 insertions(+), 0 deletions(-) diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c index b5a52e3..c3c63fa 100644 --- a/fs/nfs/fscache-index.c +++ b/fs/nfs/fscache-index.c @@ -150,3 +150,115 @@ const struct fscache_cookie_def nfs_cache_super_index_def = { .type = FSCACHE_COOKIE_TYPE_INDEX, .get_key= nfs_super_get_key, }; + +/* + * Definition of the auxiliary data attached to NFS inode storage objects + * within the cache. + * + * The contents of this struct are recorded in the on-disk local cache in the + * auxiliary data attached to the data storage object backing an inode. This + * permits coherency to be managed when a new inode binds to an already extant + * cache object. + */ +struct nfs_cache_inode_auxdata { + struct timespec mtime; + struct timespec ctime; + loff_t size; +}; + +/* + * Generate a key to describe an NFS inode in an NFS server's index + */ +static uint16_t nfs_cache_inode_get_key(const void *cookie_netfs_data, + void *buffer, uint16_t bufmax) +{ + const struct nfs_inode *nfsi = cookie_netfs_data; + uint16_t nsize; + + /* use the inode's NFS filehandle as the key */ + nsize = nfsi-fh.size; + memcpy(buffer, nfsi-fh.data, nsize); + return nsize; +} + +/* + * Get certain file attributes from the netfs data + * - This function can be absent for an index + * - Not permitted to return an error + * - The netfs data from the cookie being used as the source is presented + */ +static void nfs_cache_inode_get_attr(const void *cookie_netfs_data, uint64_t *size) +{ + const struct nfs_inode *nfsi = cookie_netfs_data; + + *size = nfsi-vfs_inode.i_size; +} + +/* + * Get the auxiliary data from netfs data + * - This function can be absent if the index carries no state data + * - Should store the auxiliary data in the buffer + * - Should return the amount of amount stored + * - Not permitted to return an error + * - The netfs data from the cookie being used as the source is presented + */ +static uint16_t nfs_cache_inode_get_aux(const void *cookie_netfs_data, + void *buffer, uint16_t bufmax) +{ + struct nfs_cache_inode_auxdata auxdata; + const struct nfs_inode *nfsi = cookie_netfs_data; + + auxdata.size = nfsi-vfs_inode.i_size; + auxdata.mtime = nfsi-vfs_inode.i_mtime; + auxdata.ctime = nfsi-vfs_inode.i_ctime; + + if (bufmax sizeof(auxdata)) + bufmax = sizeof(auxdata); + + memcpy(buffer, auxdata, bufmax); + return bufmax; +} + +/* + * Consult the netfs about the state of an object + * - This function can be absent if the index carries no state data + * - The netfs data from the cookie being used as the target is + * presented, as is the auxiliary data + */ +static enum fscache_checkaux nfs_cache_inode_check_aux(void *cookie_netfs_data, + const void *data, + uint16_t datalen) +{ + struct nfs_cache_inode_auxdata auxdata; + struct nfs_inode *nfsi = cookie_netfs_data; + + if (datalen sizeof(auxdata)) + return FSCACHE_CHECKAUX_OBSOLETE; + + auxdata.size = nfsi-vfs_inode.i_size; + auxdata.mtime = nfsi-vfs_inode.i_mtime; + auxdata.ctime = nfsi-vfs_inode.i_ctime; + + if (memcmp(data, auxdata, datalen) != 0) + return FSCACHE_CHECKAUX_OBSOLETE; + + return FSCACHE_CHECKAUX_OKAY; +} + +/* + * Define the inode object for FS-Cache. This is used to describe an inode + * object to fscache_acquire_cookie(). It is keyed by the NFS file handle for + * an inode. + * + * Coherency is managed by comparing the copies of i_size, i_mtime and i_ctime + * held in the cache auxiliary data for the data storage object with those in + * the inode struct in memory. + */ +const struct
[PATCH 08/37] Security: Add a kernel_service object class to SELinux
Add a 'kernel_service' object class to SELinux and give this object class two access vectors: 'use_as_override' and 'create_files_as'. The first vector is used to grant a process the right to nominate an alternate process security ID for the kernel to use as an override for the SELinux subjective security when accessing stuff on behalf of another process. For example, CacheFiles when accessing the cache on behalf on a process accessing an NFS file needs to use a subjective security ID appropriate to the cache rather then the one the calling process is using. The cachefilesd daemon will nominate the security ID to be used. The second vector is used to grant a process the right to nominate a file creation label for a kernel service to use. Signed-off-by: David Howells [EMAIL PROTECTED] --- security/selinux/include/av_perm_to_string.h |2 ++ security/selinux/include/av_permissions.h|2 ++ security/selinux/include/class_to_string.h |1 + security/selinux/include/flask.h |1 + 4 files changed, 6 insertions(+), 0 deletions(-) diff --git a/security/selinux/include/av_perm_to_string.h b/security/selinux/include/av_perm_to_string.h index d569669..fd6bef7 100644 --- a/security/selinux/include/av_perm_to_string.h +++ b/security/selinux/include/av_perm_to_string.h @@ -171,3 +171,5 @@ S_(SECCLASS_DCCP_SOCKET, DCCP_SOCKET__NAME_CONNECT, name_connect) S_(SECCLASS_MEMPROTECT, MEMPROTECT__MMAP_ZERO, mmap_zero) S_(SECCLASS_PEER, PEER__RECV, recv) + S_(SECCLASS_KERNEL_SERVICE, KERNEL_SERVICE__USE_AS_OVERRIDE, use_as_override) + S_(SECCLASS_KERNEL_SERVICE, KERNEL_SERVICE__CREATE_FILES_AS, create_files_as) diff --git a/security/selinux/include/av_permissions.h b/security/selinux/include/av_permissions.h index 75b4131..02ddf8d 100644 --- a/security/selinux/include/av_permissions.h +++ b/security/selinux/include/av_permissions.h @@ -836,3 +836,5 @@ #define DCCP_SOCKET__NAME_CONNECT 0x0080UL #define MEMPROTECT__MMAP_ZERO 0x0001UL #define PEER__RECV0x0001UL +#define KERNEL_SERVICE__USE_AS_OVERRIDE 0x0001UL +#define KERNEL_SERVICE__CREATE_FILES_AS 0x0002UL diff --git a/security/selinux/include/class_to_string.h b/security/selinux/include/class_to_string.h index bd813c3..373b191 100644 --- a/security/selinux/include/class_to_string.h +++ b/security/selinux/include/class_to_string.h @@ -72,3 +72,4 @@ S_(NULL) S_(peer) S_(capability2) +S_(kernel_service) diff --git a/security/selinux/include/flask.h b/security/selinux/include/flask.h index febf886..f3c5166 100644 --- a/security/selinux/include/flask.h +++ b/security/selinux/include/flask.h @@ -52,6 +52,7 @@ #define SECCLASS_MEMPROTECT 61 #define SECCLASS_PEER68 #define SECCLASS_CAPABILITY2 69 +#define SECCLASS_KERNEL_SERVICE 70 /* * Security identifier indices for initial entities - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/37] FS-Cache: Release page-private after failed readahead
The attached patch causes read_cache_pages() to release page-private data on a page for which add_to_page_cache() fails or the filler function fails. This permits pages with caching references associated with them to be cleaned up. The invalidatepage() address space op is called (indirectly) to do the honours. Signed-off-by: David Howells [EMAIL PROTECTED] --- mm/readahead.c | 39 +-- 1 files changed, 37 insertions(+), 2 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index c9c50ca..75aa6b6 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -44,6 +44,41 @@ EXPORT_SYMBOL_GPL(file_ra_state_init); #define list_to_page(head) (list_entry((head)-prev, struct page, lru)) +/* + * see if a page needs releasing upon read_cache_pages() failure + * - the caller of read_cache_pages() may have set PG_private before calling, + * such as the NFS fs marking pages that are cached locally on disk, thus we + * need to give the fs a chance to clean up in the event of an error + */ +static void read_cache_pages_invalidate_page(struct address_space *mapping, +struct page *page) +{ + if (PagePrivate(page)) { + if (TestSetPageLocked(page)) + BUG(); + page-mapping = mapping; + do_invalidatepage(page, 0); + page-mapping = NULL; + unlock_page(page); + } + page_cache_release(page); +} + +/* + * release a list of pages, invalidating them first if need be + */ +static void read_cache_pages_invalidate_pages(struct address_space *mapping, + struct list_head *pages) +{ + struct page *victim; + + while (!list_empty(pages)) { + victim = list_to_page(pages); + list_del(victim-lru); + read_cache_pages_invalidate_page(mapping, victim); + } +} + /** * read_cache_pages - populate an address space with some pages start reads against them * @mapping: the address_space @@ -65,14 +100,14 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages, list_del(page-lru); if (add_to_page_cache_lru(page, mapping, page-index, GFP_KERNEL)) { - page_cache_release(page); + read_cache_pages_invalidate_page(mapping, page); continue; } page_cache_release(page); ret = filler(data, page); if (unlikely(ret)) { - put_pages_list(pages); + read_cache_pages_invalidate_pages(mapping, pages); break; } task_io_account_read(PAGE_CACHE_SIZE); - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/37] KEYS: Increase the payload size when instantiating a key
Increase the size of a payload that can be used to instantiate a key in add_key() and keyctl_instantiate_key(). This permits huge CIFS SPNEGO blobs to be passed around. The limit is raised to 1MB. If kmalloc() can't allocate a buffer of sufficient size, vmalloc() will be tried instead. Signed-off-by: David Howells [EMAIL PROTECTED] --- security/keys/keyctl.c | 38 ++ 1 files changed, 30 insertions(+), 8 deletions(-) diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index d9ca15c..8ec8432 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -19,6 +19,7 @@ #include linux/capability.h #include linux/string.h #include linux/err.h +#include linux/vmalloc.h #include asm/uaccess.h #include internal.h @@ -62,9 +63,10 @@ asmlinkage long sys_add_key(const char __user *_type, char type[32], *description; void *payload; long ret; + bool vm; ret = -EINVAL; - if (plen 32767) + if (plen 1024 * 1024 - 1) goto error; /* draw all the data into kernel space */ @@ -81,11 +83,18 @@ asmlinkage long sys_add_key(const char __user *_type, /* pull the payload in if one was supplied */ payload = NULL; + vm = false; if (_payload) { ret = -ENOMEM; payload = kmalloc(plen, GFP_KERNEL); - if (!payload) - goto error2; + if (!payload) { + if (plen = PAGE_SIZE) + goto error2; + vm = true; + payload = vmalloc(plen); + if (!payload) + goto error2; + } ret = -EFAULT; if (copy_from_user(payload, _payload, plen) != 0) @@ -113,7 +122,10 @@ asmlinkage long sys_add_key(const char __user *_type, key_ref_put(keyring_ref); error3: - kfree(payload); + if (!vm) + kfree(payload); + else + vfree(payload); error2: kfree(description); error: @@ -821,9 +833,10 @@ long keyctl_instantiate_key(key_serial_t id, key_ref_t keyring_ref; void *payload; long ret; + bool vm = false; ret = -EINVAL; - if (plen 32767) + if (plen 1024 * 1024 - 1) goto error; /* the appropriate instantiation authorisation key must have been @@ -843,8 +856,14 @@ long keyctl_instantiate_key(key_serial_t id, if (_payload) { ret = -ENOMEM; payload = kmalloc(plen, GFP_KERNEL); - if (!payload) - goto error; + if (!payload) { + if (plen = PAGE_SIZE) + goto error; + vm = true; + payload = vmalloc(plen); + if (!payload) + goto error; + } ret = -EFAULT; if (copy_from_user(payload, _payload, plen) != 0) @@ -877,7 +896,10 @@ long keyctl_instantiate_key(key_serial_t id, } error2: - kfree(payload); + if (!vm) + kfree(payload); + else + vfree(payload); error: return ret; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 28/37] NFS: Use local disk inode cache
Bind data storage objects in the local cache to NFS inodes. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/fscache.c | 131 fs/nfs/fscache.h | 19 +++ fs/nfs/inode.c | 39 -- include/linux/nfs_fs.h | 10 4 files changed, 193 insertions(+), 6 deletions(-) diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c index cbd09f0..c0e0320 100644 --- a/fs/nfs/fscache.c +++ b/fs/nfs/fscache.c @@ -166,3 +166,134 @@ void nfs_fscache_release_super_cookie(struct super_block *sb) nfss-fscache_key = NULL; } } + +/* + * Initialise the per-inode cache cookie pointer for an NFS inode. + */ +void nfs_fscache_init_inode_cookie(struct inode *inode) +{ + NFS_I(inode)-fscache = NULL; + if (S_ISREG(inode-i_mode)) + set_bit(NFS_INO_FSCACHE, NFS_I(inode)-flags); +} + +/* + * Get the per-inode cache cookie for an NFS inode. + */ +void nfs_fscache_enable_inode_cookie(struct inode *inode) +{ + struct super_block *sb = inode-i_sb; + struct nfs_inode *nfsi = NFS_I(inode); + + if (nfsi-fscache || !NFS_FSCACHE(inode)) + return; + + if ((NFS_SB(sb)-options NFS_OPTION_FSCACHE)) { + nfsi-fscache = fscache_acquire_cookie( + NFS_SB(sb)-fscache, + nfs_cache_inode_object_def, + nfsi); + + dfprintk(FSCACHE, NFS: get FH cookie (0x%p/0x%p/0x%p)\n, +sb, nfsi, nfsi-fscache); + } +} + +/* + * Release a per-inode cookie. + */ +void nfs_fscache_release_inode_cookie(struct inode *inode) +{ + struct nfs_inode *nfsi = NFS_I(inode); + + dfprintk(FSCACHE, NFS: clear cookie (0x%p/0x%p)\n, +nfsi, nfsi-fscache); + + fscache_relinquish_cookie(nfsi-fscache, 0); + nfsi-fscache = NULL; +} + +/* + * Retire a per-inode cookie, destroying the data attached to it. + */ +void nfs_fscache_zap_inode_cookie(struct inode *inode) +{ + struct nfs_inode *nfsi = NFS_I(inode); + + dfprintk(FSCACHE, NFS: zapping cookie (0x%p/0x%p)\n, +nfsi, nfsi-fscache); + + fscache_relinquish_cookie(nfsi-fscache, 1); + nfsi-fscache = NULL; +} + +/* + * Turn off the cache with regard to a per-inode cookie if opened for writing, + * invalidating all the pages in the page cache relating to the associated + * inode to clear the per-page caching. + */ +void nfs_fscache_disable_inode_cookie(struct inode *inode) +{ + clear_bit(NFS_INO_FSCACHE, NFS_I(inode)-flags); + + if (NFS_I(inode)-fscache) { + dfprintk(FSCACHE, +NFS: nfsi 0x%p turning cache off\n, NFS_I(inode)); + + /* Need to invalidate any mapped pages that were read in before +* turning off the cache. +*/ + if (inode-i_mapping inode-i_mapping-nrpages) + invalidate_inode_pages2(inode-i_mapping); + + nfs_fscache_zap_inode_cookie(inode); + } +} + +/* + * Decide if we should enable or disable local caching for this inode. + * - For now, with NFS, only regular files that are open read-only will be able + * to use the cache. + */ +void nfs_fscache_set_inode_cookie(struct inode *inode, struct file *filp) +{ + if (NFS_FSCACHE(inode)) { + if ((filp-f_flags O_ACCMODE) != O_RDONLY) + nfs_fscache_disable_inode_cookie(inode); + else + nfs_fscache_enable_inode_cookie(inode); + } +} + +/* + * Replace a per-inode cookie due to revalidation detecting a file having + * changed on the server. + */ +void nfs_fscache_renew_inode_cookie(struct inode *inode) +{ + struct nfs_inode *nfsi = NFS_I(inode); + struct nfs_server *nfss = NFS_SERVER(inode); + struct fscache_cookie *old = nfsi-fscache; + + if (nfsi-fscache) { + /* retire the current fscache cache and get a new one */ + fscache_relinquish_cookie(nfsi-fscache, 1); + + nfsi-fscache = fscache_acquire_cookie( + nfss-nfs_client-fscache, + nfs_cache_inode_object_def, + nfsi); + + dfprintk(FSCACHE, +NFS: revalidation new cookie (0x%p/0x%p/0x%p/0x%p)\n, +nfss, nfsi, old, nfsi-fscache); + } +} + +/* + * Update the filesize associated with a per-inode cookie. + */ +void nfs_fscache_attr_changed(struct inode *inode) +{ + fscache_attr_changed(NFS_I(inode)-fscache); +} diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h index 7dcdf32..d730ec8 100644 --- a/fs/nfs/fscache.h +++ b/fs/nfs/fscache.h @@ -77,6 +77,15 @@ extern void nfs_fscache_get_super_cookie(struct super_block *, struct nfs_parsed_mount_data *); extern void nfs_fscache_release_super_cookie
[PATCH 04/37] KEYS: Add keyctl function to get a security label
Add a keyctl() function to get the security label of a key. The following is added to Documentation/keys.txt: (*) Get the LSM security context attached to a key. long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer, size_t buflen) This function returns a string that represents the LSM security context attached to a key in the buffer provided. Unless there's an error, it always returns the amount of data it could produce, even if that's too big for the buffer, but it won't copy more than requested to userspace. If the buffer pointer is NULL then no copy will take place. A NUL character is included at the end of the string if the buffer is sufficiently big. This is included in the returned count. If no LSM is in force then an empty string will be returned. A process must have view permission on the key for this function to be successful. Signed-off-by: David Howells [EMAIL PROTECTED] Acked-by: Stephen Smalley [EMAIL PROTECTED] --- Documentation/keys.txt | 21 +++ include/linux/keyctl.h |1 + include/linux/security.h | 20 +- security/dummy.c |8 ++ security/keys/compat.c |3 ++ security/keys/keyctl.c | 66 ++ security/security.c |5 +++ security/selinux/hooks.c | 21 +-- 8 files changed, 141 insertions(+), 4 deletions(-) diff --git a/Documentation/keys.txt b/Documentation/keys.txt index b82d38d..be424b0 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt @@ -711,6 +711,27 @@ The keyctl syscall functions are: The assumed authoritative key is inherited across fork and exec. + (*) Get the LSM security context attached to a key. + + long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer, + size_t buflen) + + This function returns a string that represents the LSM security context + attached to a key in the buffer provided. + + Unless there's an error, it always returns the amount of data it could + produce, even if that's too big for the buffer, but it won't copy more + than requested to userspace. If the buffer pointer is NULL then no copy + will take place. + + A NUL character is included at the end of the string if the buffer is + sufficiently big. This is included in the returned count. If no LSM is + in force then an empty string will be returned. + + A process must have view permission on the key for this function to be + successful. + + === KERNEL SERVICES === diff --git a/include/linux/keyctl.h b/include/linux/keyctl.h index 3365945..656ee6b 100644 --- a/include/linux/keyctl.h +++ b/include/linux/keyctl.h @@ -49,5 +49,6 @@ #define KEYCTL_SET_REQKEY_KEYRING 14 /* set default request-key keyring */ #define KEYCTL_SET_TIMEOUT 15 /* set key timeout */ #define KEYCTL_ASSUME_AUTHORITY16 /* assume request_key() authorisation */ +#define KEYCTL_GET_SECURITY17 /* get key security label */ #endif /* _LINUX_KEYCTL_H */ diff --git a/include/linux/security.h b/include/linux/security.h index fe52cde..a33fd03 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -970,6 +970,17 @@ struct request_sock; * @perm describes the combination of permissions required of this key. * Return 1 if permission granted, 0 if permission denied and -ve it the * normal permissions model should be effected. + * @key_getsecurity: + * Get a textual representation of the security context attached to a key + * for the purposes of honouring KEYCTL_GETSECURITY. This function + * allocates the storage for the NUL-terminated string and the caller + * should free it. + * @key points to the key to be queried. + * @_buffer points to a pointer that should be set to point to the + * resulting string (if no label or an error occurs). + * Return the length of the string (including terminating NUL) or -ve if + * an error. + * May also return 0 (and a NULL buffer pointer) if there is no label. * * Security hooks affecting all System V IPC operations. * @@ -1459,7 +1470,7 @@ struct security_operations { int (*key_permission)(key_ref_t key_ref, struct task_struct *context, key_perm_t perm); - + int (*key_getsecurity)(struct key *key, char **_buffer); #endif /* CONFIG_KEYS */ }; @@ -2600,6 +2611,7 @@ int security_key_alloc(struct key *key, struct task_struct *tsk, unsigned long f void security_key_free(struct key *key); int security_key_permission(key_ref_t key_ref, struct task_struct *context, key_perm_t perm); +int security_key_getsecurity(struct key *key, char **_buffer); #else @@ -2621,6 +2633,12 @@ static inline int
[PATCH 26/37] NFS: Define and create superblock-level objects
Define and create superblock-level cache index objects (as managed by nfs_server structs). Each superblock object is created in a server level index object and is itself an index into which inode-level objects are inserted. Ideally there would be one superblock-level object per server, and the former would be folded into the latter; however, since the nosharecache option exists this isn't possible. The superblock object key is a sequence consisting of: (1) Certain superblock s_flags. (2) Various connection parameters that serve to distinguish superblocks for sget(). (3) The volume FSID. (4) The security flavour. (5) The uniquifier length. (6) The uniquifier text. This is normally an empty string, unless the fsc=xyz mount option was used to explicitly specify a uniquifier. The key blob is of variable length, depending on the length of (6). The superblock object is given no coherency data to carry in the auxiliary data permitted by the cache. It is assumed that the superblock is always coherent. This patch also adds uniquification handling such that two otherwise identical superblocks, at least one of which is marked nosharecache, won't end up trying to share the on-disk cache. It will be possible to manually provide a uniquifier through a mount option with a later patch to avoid the error otherwise produced. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/fscache-index.c| 34 + fs/nfs/fscache.c | 116 + fs/nfs/fscache.h | 49 +++ fs/nfs/internal.h |3 + fs/nfs/super.c|8 ++- include/linux/nfs_fs_sb.h |5 ++ 6 files changed, 213 insertions(+), 2 deletions(-) diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c index 25ac4a1..b5a52e3 100644 --- a/fs/nfs/fscache-index.c +++ b/fs/nfs/fscache-index.c @@ -116,3 +116,37 @@ const struct fscache_cookie_def nfs_cache_server_index_def = { .type = FSCACHE_COOKIE_TYPE_INDEX, .get_key= nfs_server_get_key, }; + +/* + * Generate a key to describe a superblock key in the main NFS index + */ +static uint16_t nfs_super_get_key(const void *cookie_netfs_data, + void *buffer, uint16_t bufmax) +{ + const struct nfs_fscache_key *key; + const struct nfs_server *nfss = cookie_netfs_data; + uint16_t len; + + key = nfss-fscache_key; + len = sizeof(key-key) + key-key.uniq_len; + if (len bufmax) { + len = 0; + } else { + memcpy(buffer, key-key, sizeof(key-key)); + memcpy(buffer + sizeof(key-key), + key-key.uniquifier, key-key.uniq_len); + } + + return len; +} + +/* + * Define the superblock object for FS-Cache. This is used to describe a + * superblock object to fscache_acquire_cookie(). It is keyed by all the NFS + * parameters that might cause a separate superblock. + */ +const struct fscache_cookie_def nfs_cache_super_index_def = { + .name = NFS.super, + .type = FSCACHE_COOKIE_TYPE_INDEX, + .get_key= nfs_super_get_key, +}; diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c index dcc1800..cbd09f0 100644 --- a/fs/nfs/fscache.c +++ b/fs/nfs/fscache.c @@ -23,6 +23,9 @@ #define NFSDBG_FACILITYNFSDBG_FSCACHE +static struct rb_root nfs_fscache_keys = RB_ROOT; +static DEFINE_SPINLOCK(nfs_fscache_keys_lock); + /* * Get the per-client index cookie for an NFS client if the appropriate mount * flag was set @@ -50,3 +53,116 @@ void nfs_fscache_release_client_cookie(struct nfs_client *clp) fscache_relinquish_cookie(clp-fscache, 0); clp-fscache = NULL; } + +/* + * Get the cache cookie for an NFS superblock. We have to handle + * uniquification here because the cache doesn't do it for us. + */ +void nfs_fscache_get_super_cookie(struct super_block *sb, + struct nfs_parsed_mount_data *data) +{ + struct nfs_fscache_key *key, *xkey; + struct nfs_server *nfss = NFS_SB(sb); + struct rb_node **p, *parent; + const char *uniq = data-fscache_uniq ?: ; + int diff, ulen; + + ulen = strlen(uniq); + key = kzalloc(sizeof(*key) + ulen, GFP_KERNEL); + if (!key) + return; + + key-nfs_client = nfss-nfs_client; + key-key.super.s_flags = sb-s_flags NFS_MS_MASK; + key-key.nfs_server.flags = nfss-flags; + key-key.nfs_server.rsize = nfss-rsize; + key-key.nfs_server.wsize = nfss-wsize; + key-key.nfs_server.acregmin = nfss-acregmin; + key-key.nfs_server.acregmax = nfss-acregmax; + key-key.nfs_server.acdirmin = nfss-acdirmin; + key-key.nfs_server.acdirmax = nfss-acdirmax; + key-key.nfs_server.fsid = nfss-fsid; + key-key.rpc_auth.au_flavor = nfss-client-cl_auth-au_flavor; + + key-key.uniq_len = ulen
[PATCH 22/37] NFS: Add FS-Cache option bit and debug bit
Add FS-Cache option bit to nfs_server struct. This is set to indicate local on-disk caching is enabled for a particular superblock. Also add debug bit for local caching operations. Signed-off-by: David Howells [EMAIL PROTECTED] --- include/linux/nfs_fs.h|1 + include/linux/nfs_fs_sb.h |2 ++ 2 files changed, 3 insertions(+), 0 deletions(-) diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index a69ba80..14894c9 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -578,6 +578,7 @@ extern void * nfs_root_data(void); #define NFSDBG_CALLBACK0x0100 #define NFSDBG_CLIENT 0x0200 #define NFSDBG_MOUNT 0x0400 +#define NFSDBG_FSCACHE 0x0800 #define NFSDBG_ALL 0x #ifdef __KERNEL__ diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h index 3423c67..e7c4cdd 100644 --- a/include/linux/nfs_fs_sb.h +++ b/include/linux/nfs_fs_sb.h @@ -99,6 +99,8 @@ struct nfs_server { unsigned intacdirmin; unsigned intacdirmax; unsigned intnamelen; + unsigned intoptions;/* extra options enabled by mount */ +#define NFS_OPTION_FSCACHE 0x0001 /* - local caching enabled */ struct nfs_fsid fsid; __u64 maxfilesize;/* maximum file size */ - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/37] CacheFiles: Be consistent about the use of mapping vs file-f_mapping in Ext3
Change all the usages of file-f_mapping in ext3_*write_end() functions to use the mapping argument directly. This has two consequences: (*) Consistency. Without this patch sometimes one is used and sometimes the other is. (*) A NULL file pointer can be passed. This feature is then made use of by the generic hook in the next patch, which is used by CacheFiles to write pages to a file without setting up a file struct. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/ext3/inode.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index eb95670..c976123 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -1215,7 +1215,7 @@ static int ext3_generic_write_end(struct file *file, loff_t pos, unsigned len, unsigned copied, struct page *page, void *fsdata) { - struct inode *inode = file-f_mapping-host; + struct inode *inode = mapping-host; copied = block_write_end(file, mapping, pos, len, copied, page, fsdata); @@ -1240,7 +1240,7 @@ static int ext3_ordered_write_end(struct file *file, struct page *page, void *fsdata) { handle_t *handle = ext3_journal_current_handle(); - struct inode *inode = file-f_mapping-host; + struct inode *inode = mapping-host; unsigned from, to; int ret = 0, ret2; @@ -1281,7 +1281,7 @@ static int ext3_writeback_write_end(struct file *file, struct page *page, void *fsdata) { handle_t *handle = ext3_journal_current_handle(); - struct inode *inode = file-f_mapping-host; + struct inode *inode = mapping-host; int ret = 0, ret2; loff_t new_i_size; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/37] FS-Cache: Recruit a couple of page flags for cache management
Recruit a couple of page flags to aid in cache management. The following extra flags are defined: (1) PG_fscache (PG_private_2) The marked page is backed by a local cache and is pinning resources in the cache driver. (2) PG_fscache_write (PG_owner_priv_2) The marked page is being written to the local cache. The page may not be modified whilst this is in progress. If PG_fscache is set, then things that checked for PG_private will now also check for that. This includes things like truncation and page invalidation. The function page_has_private() had been added to make the checks for both PG_private and PG_private_2 at the same time. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/splice.c|2 +- include/linux/page-flags.h | 39 +-- include/linux/pagemap.h| 11 +++ mm/filemap.c | 18 ++ mm/migrate.c |2 +- mm/page_alloc.c|3 +++ mm/readahead.c |9 + mm/swap.c |4 ++-- mm/swap_state.c|4 ++-- mm/truncate.c | 10 +- mm/vmscan.c|2 +- 11 files changed, 86 insertions(+), 18 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 9b559ee..f2a7a06 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -58,7 +58,7 @@ static int page_cache_pipe_buf_steal(struct pipe_inode_info *pipe, */ wait_on_page_writeback(page); - if (PagePrivate(page)) + if (page_has_private(page)) try_to_release_page(page, GFP_KERNEL); /* diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index bbad43f..cc16c23 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -77,25 +77,32 @@ #define PG_active 6 #define PG_slab 7 /* slab debug (Suparna wants this) */ -#define PG_owner_priv_1 8 /* Owner use. If pagecache, fs may use*/ +#define PG_owner_priv_1 8 /* Owner use. fs may use in pagecache */ #define PG_arch_1 9 #define PG_reserved10 #define PG_private 11 /* If pagecache, has fs-private data */ #define PG_writeback 12 /* Page is under writeback */ +#define PG_private_2 13 /* If pagecache, has fs aux data */ #define PG_compound14 /* Part of a compound page */ #define PG_swapcache 15 /* Swap page: swp_entry_t in private */ #define PG_mappedtodisk16 /* Has blocks allocated on-disk */ #define PG_reclaim 17 /* To be reclaimed asap */ +#define PG_owner_priv_218 /* Owner use. fs may use in pagecache */ #define PG_buddy 19 /* Page is free, on buddy lists */ /* PG_readahead is only used for file reads; PG_reclaim is only for writes */ #define PG_readahead PG_reclaim /* Reminder to do async read-ahead */ -/* PG_owner_priv_1 users should have descriptive aliases */ +/* PG_owner_priv_1/2 users should have descriptive aliases */ #define PG_checked PG_owner_priv_1 /* Used by some filesystems */ #define PG_pinned PG_owner_priv_1 /* Xen pinned pagetable */ +#define PG_fscache_write PG_owner_priv_2 /* Writing to local cache */ + +/* PG_private_2 causes releasepage() and co to be invoked */ +#define PG_fscache PG_private_2/* Backed by local cache */ + #if (BITS_PER_LONG 32) /* @@ -235,6 +242,23 @@ static inline void SetPageUptodate(struct page *page) #define TestClearPageWriteback(page) test_and_clear_bit(PG_writeback, \ (page)-flags) +#define PagePrivate2(page) test_bit(PG_private_2, (page)-flags) +#define SetPagePrivate2(page) set_bit(PG_private_2, (page)-flags) +#define ClearPagePrivate2(page)clear_bit(PG_private_2, (page)-flags) +#define TestSetPagePrivate2(page) test_and_set_bit(PG_private_2, (page)-flags) +#define TestClearPagePrivate2(page) test_and_clear_bit(PG_private_2, \ + (page)-flags) + +#define PageOwnerPriv2(page) test_bit(PG_owner_priv_2, \ +(page)-flags) +#define SetPageOwnerPriv2(page)set_bit(PG_owner_priv_2, (page)-flags) +#define ClearPageOwnerPriv2(page) clear_bit(PG_owner_priv_2, \ + (page)-flags) +#define TestSetPageOwnerPriv2(page)test_and_set_bit(PG_owner_priv_2, \ +(page)-flags) +#define TestClearPageOwnerPriv2(page) test_and_clear_bit(PG_owner_priv_2, \ + (page)-flags) + #define PageBuddy(page
[PATCH 29/37] NFS: Invalidate FsCache page flags when cache removed
Invalidate the FsCache page flags on the pages belonging to an inode when the cache backing that NFS inode is removed. This allows a live cache to be withdrawn. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/fscache-index.c | 40 1 files changed, 40 insertions(+), 0 deletions(-) diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c index c3c63fa..eec8e7e 100644 --- a/fs/nfs/fscache-index.c +++ b/fs/nfs/fscache-index.c @@ -246,6 +246,45 @@ static enum fscache_checkaux nfs_cache_inode_check_aux(void *cookie_netfs_data, } /* + * Indication from FS-Cache that the cookie is no longer cached + * - This function is called when the backing store currently caching a cookie + * is removed + * - The netfs should use this to clean up any markers indicating cached pages + * - This is mandatory for any object that may have data + */ +static void nfs_cache_inode_now_uncached(void *cookie_netfs_data) +{ + struct nfs_inode *nfsi = cookie_netfs_data; + struct pagevec pvec; + pgoff_t first; + int loop, nr_pages; + + pagevec_init(pvec, 0); + first = 0; + + dprintk(NFS: nfs_inode_now_uncached: nfs_inode 0x%p\n, nfsi); + + for (;;) { + /* grab a bunch of pages to unmark */ + nr_pages = pagevec_lookup(pvec, + nfsi-vfs_inode.i_mapping, + first, + PAGEVEC_SIZE - pagevec_count(pvec)); + if (!nr_pages) + break; + + for (loop = 0; loop nr_pages; loop++) + ClearPageFsCache(pvec.pages[loop]); + + first = pvec.pages[nr_pages - 1]-index + 1; + + pvec.nr = nr_pages; + pagevec_release(pvec); + cond_resched(); + } +} + +/* * Define the inode object for FS-Cache. This is used to describe an inode * object to fscache_acquire_cookie(). It is keyed by the NFS file handle for * an inode. @@ -261,4 +300,5 @@ const struct fscache_cookie_def nfs_cache_inode_object_def = { .get_attr = nfs_cache_inode_get_attr, .get_aux= nfs_cache_inode_get_aux, .check_aux = nfs_cache_inode_check_aux, + .now_uncached = nfs_cache_inode_now_uncached, }; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 33/37] NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching
nfs_readpage_async() needs to be non-static so that it can be used as a fallback for the local on-disk caching should an EIO crop up when reading the cache. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/read.c |4 ++-- include/linux/nfs_fs.h |2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/nfs/read.c b/fs/nfs/read.c index 3d7d963..725a5a2 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -114,8 +114,8 @@ static void nfs_readpage_truncate_uninitialised_page(struct nfs_read_data *data) } } -static int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode, - struct page *page) +int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode, + struct page *page) { LIST_HEAD(one_request); struct nfs_page *new; diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index d9adb53..d1d545e 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -505,6 +505,8 @@ extern int nfs_readpages(struct file *, struct address_space *, struct list_head *, unsigned); extern int nfs_readpage_result(struct rpc_task *, struct nfs_read_data *); extern void nfs_readdata_release(void *data); +extern int nfs_readpage_async(struct nfs_open_context *, struct inode *, + struct page *); /* * Allocate nfs_read_data structures - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 25/37] NFS: Define and create server-level objects
Define and create server-level cache index objects (as managed by nfs_client structs). Each server object is created in the NFS top-level index object and is itself an index into which superblock-level objects are inserted. Ideally there would be one superblock-level object per server, and the former would be folded into the latter; however, since the nosharecache option exists this isn't possible. The server object key is a sequence consisting of: (1) NFS version (2) Server address family (eg: AF_INET or AF_INET6) (3) Server port. (4) Server IP address. The key blob is of variable length, depending on the length of (4). The server object is given no coherency data to carry in the auxiliary data permitted by the cache. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/Makefile |2 + fs/nfs/client.c |5 +++ fs/nfs/fscache-index.c| 65 + fs/nfs/fscache.c | 52 fs/nfs/fscache.h | 10 +++ include/linux/nfs_fs_sb.h |4 +++ 6 files changed, 137 insertions(+), 1 deletions(-) create mode 100644 fs/nfs/fscache.c diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile index 6d7176d..d848c97 100644 --- a/fs/nfs/Makefile +++ b/fs/nfs/Makefile @@ -16,4 +16,4 @@ nfs-$(CONFIG_NFS_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o \ nfs4namespace.o nfs-$(CONFIG_NFS_DIRECTIO) += direct.o nfs-$(CONFIG_SYSCTL) += sysctl.o -nfs-$(CONFIG_NFS_FSCACHE) += fscache-index.o +nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-index.o diff --git a/fs/nfs/client.c b/fs/nfs/client.c index c5c0175..51e9346 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -45,6 +45,7 @@ #include delegation.h #include iostat.h #include internal.h +#include fscache.h #define NFSDBG_FACILITYNFSDBG_CLIENT @@ -151,6 +152,8 @@ static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_ clp-cl_state = 1 NFS4CLNT_LEASE_EXPIRED; #endif + nfs_fscache_get_client_cookie(clp); + return clp; error_3: @@ -182,6 +185,8 @@ static void nfs_free_client(struct nfs_client *clp) nfs4_shutdown_client(clp); + nfs_fscache_release_client_cookie(clp); + /* -EIO all pending I/O */ if (!IS_ERR(clp-cl_rpcclient)) rpc_shutdown_client(clp-cl_rpcclient); diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c index 225ed5d..25ac4a1 100644 --- a/fs/nfs/fscache-index.c +++ b/fs/nfs/fscache-index.c @@ -51,3 +51,68 @@ void nfs_fscache_unregister(void) { fscache_unregister_netfs(nfs_cache_netfs); } + +/* + * Layout of the key for an NFS server cache object. + */ +struct nfs_server_key { + uint16_tnfsversion; /* NFS protocol version */ + uint16_tfamily; /* address family */ + uint16_tport; /* IP port */ + union { + struct in_addr ipv4_addr; /* IPv4 address */ + struct in6_addr ipv6_addr; /* IPv6 address */ + } addr[0]; +}; + +/* + * Generate a key to describe a server in the main NFS index + * - We return the length of the key, or 0 if we can't generate one + */ +static uint16_t nfs_server_get_key(const void *cookie_netfs_data, + void *buffer, uint16_t bufmax) +{ + const struct nfs_client *clp = cookie_netfs_data; + const struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *) clp-cl_addr; + const struct sockaddr_in *sin = (struct sockaddr_in *) clp-cl_addr; + struct nfs_server_key *key = buffer; + uint16_t len = 0; + + key-nfsversion = clp-rpc_ops-version; + key-family = clp-cl_addr.ss_family; + + len = sizeof(struct nfs_server_key); + + switch (clp-cl_addr.ss_family) { + case AF_INET: + key-port = sin-sin_port; + key-addr[0].ipv4_addr = sin-sin_addr; + len += sizeof(key-addr[0].ipv4_addr); + break; + + case AF_INET6: + key-port = sin6-sin6_port; + key-addr[0].ipv6_addr = sin6-sin6_addr; + len += sizeof(key-addr[0].ipv6_addr); + break; + + default: + printk(KERN_WARNING NFS: Unknown network family '%d'\n, + clp-cl_addr.ss_family); + len = 0; + break; + } + + return len; +} + +/* + * Define the server object for FS-Cache. This is used to describe a server + * object to fscache_acquire_cookie(). It is keyed by the NFS protocol and + * server address parameters. + */ +const struct fscache_cookie_def nfs_cache_server_index_def = { + .name = NFS.server, + .type = FSCACHE_COOKIE_TYPE_INDEX, + .get_key= nfs_server_get_key, +}; diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c new file mode 100644 index
[PATCH 36/37] NFS: Display local caching state
Display the local caching state in /proc/fs/nfsfs/volumes. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/client.c |7 --- fs/nfs/fscache.h | 15 +++ 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index 51e9346..d67d52f 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -1451,7 +1451,7 @@ static int nfs_volume_list_show(struct seq_file *m, void *v) /* display header on line 1 */ if (v == nfs_volume_list) { - seq_puts(m, NV SERVER PORT DEV FSID\n); + seq_puts(m, NV SERVER PORT DEV FSID FSC\n); return 0; } /* display one transport per line on subsequent lines */ @@ -1465,12 +1465,13 @@ static int nfs_volume_list_show(struct seq_file *m, void *v) (unsigned long long) server-fsid.major, (unsigned long long) server-fsid.minor); - seq_printf(m, v%u %s %s %-7s %-17s\n, + seq_printf(m, v%u %s %s %-7s %-17s %s\n, clp-rpc_ops-version, rpc_peeraddr2str(clp-cl_rpcclient, RPC_DISPLAY_HEX_ADDR), rpc_peeraddr2str(clp-cl_rpcclient, RPC_DISPLAY_HEX_PORT), dev, - fsid); + fsid, + nfs_server_fscache_state(server)); return 0; } diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h index 6264cd8..5f7806f 100644 --- a/fs/nfs/fscache.h +++ b/fs/nfs/fscache.h @@ -146,6 +146,16 @@ static inline void nfs_readpage_to_fscache(struct inode *inode, __nfs_readpage_to_fscache(inode, page, sync); } +/* + * indicate the client caching state as readable text + */ +static inline const char *nfs_server_fscache_state(struct nfs_server *server) +{ + if (server-fscache (server-options NFS_OPTION_FSCACHE)) + return yes; + return no ; +} + #else /* CONFIG_NFS_FSCACHE */ static inline int nfs_fscache_register(void) { return 0; } @@ -195,5 +205,10 @@ static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx, static inline void nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync) {} +static inline const char *nfs_server_fscache_state(struct nfs_server *server) +{ + return no ; +} + #endif /* CONFIG_NFS_FSCACHE */ #endif /* _NFS_FSCACHE_H */ - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/37] CacheFiles: Add missing copy_page export for ia64
This one-line patch fixes the missing export of copy_page introduced by the cachefile patches. This patch is not yet upstream, but is required for cachefile on ia64. It will be pushed upstream when cachefile goes upstream. Signed-off-by: Prarit Bhargava [EMAIL PROTECTED] Signed-off-by: David Howells [EMAIL PROTECTED] --- arch/ia64/kernel/ia64_ksyms.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kernel/ia64_ksyms.c b/arch/ia64/kernel/ia64_ksyms.c index 8e7193d..3e544f4 100644 --- a/arch/ia64/kernel/ia64_ksyms.c +++ b/arch/ia64/kernel/ia64_ksyms.c @@ -46,6 +46,7 @@ EXPORT_SYMBOL(__do_clear_user); EXPORT_SYMBOL(__strlen_user); EXPORT_SYMBOL(__strncpy_from_user); EXPORT_SYMBOL(__strnlen_user); +EXPORT_SYMBOL(copy_page); /* from arch/ia64/lib */ extern void __divsi3(void); - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 34/37] NFS: Read pages from FS-Cache into an NFS inode
Read pages from an FS-Cache data storage object representing an inode into an NFS inode. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/fscache.c | 112 ++ fs/nfs/fscache.h | 47 +++ fs/nfs/read.c| 18 + 3 files changed, 176 insertions(+), 1 deletions(-) diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c index d475ff5..438cc9b 100644 --- a/fs/nfs/fscache.c +++ b/fs/nfs/fscache.c @@ -344,5 +344,115 @@ void __nfs_fscache_invalidate_page(struct page *page, struct inode *inode) BUG_ON(!PageLocked(page)); fscache_uncache_page(nfsi-fscache, page); - nfs_add_stats(page-mapping-host, NFSIOS_FSCACHE_UNCACHE, 1); + nfs_add_stats(inode, NFSIOS_FSCACHE_UNCACHE, 1); +} + +/* + * Handle completion of a page being read from the cache. + * - Called in process (keventd) context. + */ +static void nfs_readpage_from_fscache_complete(struct page *page, + void *context, + int error) +{ + dfprintk(FSCACHE, +NFS: readpage_from_fscache_complete (0x%p/0x%p/%d)\n, +page, context, error); + + /* if the read completes with an error, we just unlock the page and let +* the VM reissue the readpage */ + if (!error) { + SetPageUptodate(page); + unlock_page(page); + } else { + error = nfs_readpage_async(context, page-mapping-host, page); + if (error) + unlock_page(page); + } +} + +/* + * Retrieve a page from fscache + */ +int __nfs_readpage_from_fscache(struct nfs_open_context *ctx, + struct inode *inode, struct page *page) +{ + int ret; + + dfprintk(FSCACHE, +NFS: readpage_from_fscache(fsc:%p/p:%p(i:%lx f:%lx)/0x%p)\n, +NFS_I(inode)-fscache, page, page-index, page-flags, inode); + + ret = fscache_read_or_alloc_page(NFS_I(inode)-fscache, +page, +nfs_readpage_from_fscache_complete, +ctx, +GFP_KERNEL); + + switch (ret) { + case 0: /* read BIO submitted (page in fscache) */ + dfprintk(FSCACHE, +NFS:readpage_from_fscache: BIO submitted\n); + nfs_add_stats(inode, NFSIOS_FSCACHE_READ_OK, 1); + return ret; + + case -ENOBUFS: /* inode not in cache */ + case -ENODATA: /* page not in cache */ + nfs_add_stats(inode, NFSIOS_FSCACHE_READ_FAIL, 1); + dfprintk(FSCACHE, +NFS:readpage_from_fscache %d\n, ret); + return 1; + + default: + dfprintk(FSCACHE, NFS:readpage_from_fscache %d\n, ret); + nfs_add_stats(inode, NFSIOS_FSCACHE_READ_FAIL, 1); + } + return ret; +} + +/* + * Retrieve a set of pages from fscache + */ +int __nfs_readpages_from_fscache(struct nfs_open_context *ctx, +struct inode *inode, +struct address_space *mapping, +struct list_head *pages, +unsigned *nr_pages) +{ + int ret, npages = *nr_pages; + + dfprintk(FSCACHE, NFS: nfs_getpages_from_fscache (0x%p/%u/0x%p)\n, +NFS_I(inode)-fscache, npages, inode); + + ret = fscache_read_or_alloc_pages(NFS_I(inode)-fscache, + mapping, pages, nr_pages, + nfs_readpage_from_fscache_complete, + ctx, + mapping_gfp_mask(mapping)); + if (*nr_pages npages) + nfs_add_stats(inode, NFSIOS_FSCACHE_READ_OK, npages); + if (*nr_pages 0) + nfs_add_stats(inode, NFSIOS_FSCACHE_READ_FAIL, *nr_pages); + + switch (ret) { + case 0: /* read submitted to the cache for all pages */ + BUG_ON(!list_empty(pages)); + BUG_ON(*nr_pages != 0); + dfprintk(FSCACHE, +NFS: nfs_getpages_from_fscache: submitted\n); + + return ret; + + case -ENOBUFS: /* some pages aren't cached and can't be */ + case -ENODATA: /* some pages aren't cached */ + dfprintk(FSCACHE, +NFS: nfs_getpages_from_fscache: no page: %d\n, ret); + return 1; + + default: + dfprintk(FSCACHE, +NFS: nfs_getpages_from_fscache: ret %d\n, ret); + } + + return ret; } diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h index 1cb7d96..4c1e1a8 100644 --- a/fs/nfs/fscache.h +++ b/fs/nfs/fscache.h @@ -89,6 +89,12
[PATCH 18/37] CacheFiles: Permit the page lock state to be monitored
Add a function to install a monitor on the page lock waitqueue for a particular page, thus allowing the page being unlocked to be detected. This is used by CacheFiles to detect read completion on a page in the backing filesystem so that it can then copy the data to the waiting netfs page. Signed-off-by: David Howells [EMAIL PROTECTED] --- include/linux/pagemap.h |5 + mm/filemap.c| 18 ++ 2 files changed, 23 insertions(+), 0 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index c8bd762..76b5307 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -242,6 +242,11 @@ static inline void wait_on_page_owner_priv_2(struct page *page) extern void end_page_owner_priv_2(struct page *page); /* + * Add an arbitrary waiter to a page's wait queue + */ +extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter); + +/* * Fault a userspace page into pagetables. Return non-zero on a fault. * * This assumes that two userspace pages are always sufficient. That's diff --git a/mm/filemap.c b/mm/filemap.c index a583f44..561e6c7 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -548,6 +548,24 @@ void wait_on_page_bit(struct page *page, int bit_nr) EXPORT_SYMBOL(wait_on_page_bit); /** + * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue + * @page - Page defining the wait queue of interest + * @waiter - Waiter to add to the queue + * + * Add an arbitrary @waiter to the wait queue for the nominated @page. + */ +void add_page_wait_queue(struct page *page, wait_queue_t *waiter) +{ + wait_queue_head_t *q = page_waitqueue(page); + unsigned long flags; + + spin_lock_irqsave(q-lock, flags); + __add_wait_queue(q, waiter); + spin_unlock_irqrestore(q-lock, flags); +} +EXPORT_SYMBOL_GPL(add_page_wait_queue); + +/** * unlock_page - unlock a locked page * @page: the page * - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/37] CacheFiles: Add a hook to write a single page of data to an inode
Add an address space operation to write one single page of data to an inode at a page-aligned location (thus permitting the implementation to be highly optimised). The data source is a single page. This is used by CacheFiles to store the contents of netfs pages into their backing file pages. Supply a generic implementation for this that uses the write_begin() and write_end() address_space operations to bind a copy directly into the page cache. Hook the Ext2 and Ext3 operations to the generic implementation. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/ext2/inode.c|2 ++ fs/ext3/inode.c|3 +++ include/linux/fs.h |7 ++ mm/filemap.c | 61 4 files changed, 73 insertions(+), 0 deletions(-) diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index c620068..f483014 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -792,6 +792,7 @@ const struct address_space_operations ext2_aops = { .direct_IO = ext2_direct_IO, .writepages = ext2_writepages, .migratepage= buffer_migrate_page, + .write_one_page = generic_file_buffered_write_one_page, }; const struct address_space_operations ext2_aops_xip = { @@ -810,6 +811,7 @@ const struct address_space_operations ext2_nobh_aops = { .direct_IO = ext2_direct_IO, .writepages = ext2_writepages, .migratepage= buffer_migrate_page, + .write_one_page = generic_file_buffered_write_one_page, }; /* diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index c976123..0209f3b 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -1776,6 +1776,7 @@ static const struct address_space_operations ext3_ordered_aops = { .releasepage= ext3_releasepage, .direct_IO = ext3_direct_IO, .migratepage= buffer_migrate_page, + .write_one_page = generic_file_buffered_write_one_page, }; static const struct address_space_operations ext3_writeback_aops = { @@ -1790,6 +1791,7 @@ static const struct address_space_operations ext3_writeback_aops = { .releasepage= ext3_releasepage, .direct_IO = ext3_direct_IO, .migratepage= buffer_migrate_page, + .write_one_page = generic_file_buffered_write_one_page, }; static const struct address_space_operations ext3_journalled_aops = { @@ -1803,6 +1805,7 @@ static const struct address_space_operations ext3_journalled_aops = { .bmap = ext3_bmap, .invalidatepage = ext3_invalidatepage, .releasepage= ext3_releasepage, + .write_one_page = generic_file_buffered_write_one_page, }; void ext3_set_aops(struct inode *inode) diff --git a/include/linux/fs.h b/include/linux/fs.h index d218ef5..dd6c3d1 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -481,6 +481,11 @@ struct address_space_operations { int (*migratepage) (struct address_space *, struct page *, struct page *); int (*launder_page) (struct page *); + /* write the contents of the source page over the page at the specified +* index in the target address space (the source page does not need to +* be related to the target address space) */ + int (*write_one_page)(struct address_space *, pgoff_t, struct page *); + }; /* @@ -1811,6 +1816,8 @@ extern ssize_t generic_file_direct_write(struct kiocb *, const struct iovec *, unsigned long *, loff_t, loff_t *, size_t, size_t); extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec *, unsigned long, loff_t, loff_t *, size_t, ssize_t); +extern int generic_file_buffered_write_one_page(struct address_space *, + pgoff_t, struct page *); extern ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos); extern ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos); extern int generic_segment_checks(const struct iovec *iov, diff --git a/mm/filemap.c b/mm/filemap.c index df1e149..a583f44 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2359,6 +2359,67 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov, } EXPORT_SYMBOL(generic_file_buffered_write); +/** + * generic_file_buffered_write_one_page - Write a single page of data to an + * inode + * @mapping - The address space of the target inode + * @index - The target page in the target inode to fill + * @source - The data to write into the target page + * + * Write the data from the source page to the page in the nominated address + * space at the @index specified. Note that the file will not be extended if + * the page crosses the EOF marker, in which case only the first part of the + * page will be written. + * + * The @source page does not need to have any association
[PATCH 31/37] NFS: FS-Cache page management
FS-Cache page management for NFS. This includes hooking the releasing and invalidation of pages marked with PG_fscache (aka PG_private_2) and waiting for completion of the write-to-cache flag (PG_fscache_write aka PG_owner_priv_2). Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/file.c| 17 + fs/nfs/fscache.c | 49 + fs/nfs/fscache.h | 22 ++ 3 files changed, 84 insertions(+), 4 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 26a073b..60db3ea 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -35,6 +35,7 @@ #include delegation.h #include internal.h #include iostat.h +#include fscache.h #define NFSDBG_FACILITYNFSDBG_FILE @@ -358,7 +359,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping, * Partially or wholly invalidate a page * - Release the private state associated with a page if undergoing complete * page invalidation - * - Called if either PG_private or PG_private_2 is set on the page + * - Called if either PG_private or PG_fscache is set on the page * - Caller holds page lock */ static void nfs_invalidate_page(struct page *page, unsigned long offset) @@ -367,30 +368,35 @@ static void nfs_invalidate_page(struct page *page, unsigned long offset) return; /* Cancel any unstarted writes on this page */ nfs_wb_page_cancel(page-mapping-host, page); + + nfs_fscache_invalidate_page(page, page-mapping-host); } /* * Attempt to release the private state associated with a page - * - Called if either PG_private or PG_private_2 is set on the page + * - Called if either PG_private or PG_fscache is set on the page * - Caller holds page lock * - Return true (may release page) or false (may not) */ static int nfs_release_page(struct page *page, gfp_t gfp) { /* If PagePrivate() is set, then the page is not freeable */ - return 0; + if (PagePrivate(page)) + return 0; + return nfs_fscache_release_page(page, gfp); } /* * Attempt to clear the private state associated with a page when an error * occurs that requires the cached contents of an inode to be written back or * destroyed - * - Called if either PG_private or PG_private_2 is set on the page + * - Called if either PG_private or fscache is set on the page * - Caller holds page lock * - Return 0 if successful, -error otherwise */ static int nfs_launder_page(struct page *page) { + wait_on_page_fscache_write(page); return nfs_wb_page(page-mapping-host, page); } @@ -422,6 +428,9 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct page *page) int ret = -EINVAL; struct address_space *mapping; + /* make sure the cache has finished storing the page */ + wait_on_page_fscache_write(page); + lock_page(page); mapping = page-mapping; if (mapping != vma-vm_file-f_path.dentry-d_inode-i_mapping) diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c index c0e0320..d475ff5 100644 --- a/fs/nfs/fscache.c +++ b/fs/nfs/fscache.c @@ -19,6 +19,7 @@ #include linux/seq_file.h #include internal.h +#include iostat.h #include fscache.h #define NFSDBG_FACILITYNFSDBG_FSCACHE @@ -297,3 +298,51 @@ void nfs_fscache_attr_changed(struct inode *inode) { fscache_attr_changed(NFS_I(inode)-fscache); } + +/* + * Release the caching state associated with a page, if the page isn't busy + * interacting with the cache. + * - Returns true (can release page) or false (page busy). + */ +int nfs_fscache_release_page(struct page *page, gfp_t gfp) +{ + if (PageFsCacheWrite(page)) { + if (!(gfp __GFP_WAIT)) + return 0; + wait_on_page_fscache_write(page); + } + + if (PageFsCache(page)) { + struct nfs_inode *nfsi = NFS_I(page-mapping-host); + + BUG_ON(!nfsi-fscache); + + dfprintk(FSCACHE, NFS: fscache releasepage (0x%p/0x%p/0x%p)\n, +nfsi-fscache, page, nfsi); + + fscache_uncache_page(nfsi-fscache, page); + nfs_add_stats(page-mapping-host, NFSIOS_FSCACHE_UNCACHE, 1); + } + + return 1; +} + +/* + * Release the caching state associated with a page if undergoing complete page + * invalidation. + */ +void __nfs_fscache_invalidate_page(struct page *page, struct inode *inode) +{ + struct nfs_inode *nfsi = NFS_I(inode); + + BUG_ON(!nfsi-fscache); + + dfprintk(FSCACHE, NFS: fscache invalidatepage (0x%p/0x%p/0x%p)\n, +nfsi-fscache, page, nfsi); + + wait_on_page_fscache_write(page); + + BUG_ON(!PageLocked(page)); + fscache_uncache_page(nfsi-fscache, page); + nfs_add_stats(page-mapping-host, NFSIOS_FSCACHE_UNCACHE, 1); +} diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h index d730ec8..1cb7d96
[PATCH 35/37] NFS: Store pages from an NFS inode into a local cache
Store pages from an NFS inode into the cache data storage object associated with that inode. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/fscache.c | 26 ++ fs/nfs/fscache.h | 16 fs/nfs/read.c|5 + 3 files changed, 47 insertions(+), 0 deletions(-) diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c index 438cc9b..50ae70f 100644 --- a/fs/nfs/fscache.c +++ b/fs/nfs/fscache.c @@ -456,3 +456,29 @@ int __nfs_readpages_from_fscache(struct nfs_open_context *ctx, return ret; } + +/* + * Store a newly fetched page in fscache + * - PG_fscache must be set on the page + */ +void __nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync) +{ + int ret; + + dfprintk(FSCACHE, +NFS: readpage_to_fscache(fsc:%p/p:%p(i:%lx f:%lx)/%d)\n, +NFS_I(inode)-fscache, page, page-index, page-flags, sync); + + ret = fscache_write_page(NFS_I(inode)-fscache, page, GFP_KERNEL); + dfprintk(FSCACHE, +NFS: readpage_to_fscache: p:%p(i:%lu f:%lx) ret %d\n, +page, page-index, page-flags, ret); + + if (ret != 0) { + fscache_uncache_page(NFS_I(inode)-fscache, page); + nfs_add_stats(inode, NFSIOS_FSCACHE_WRITE_FAIL, 1); + nfs_add_stats(inode, NFSIOS_FSCACHE_UNCACHE, 1); + } else { + nfs_add_stats(inode, NFSIOS_FSCACHE_WRITE_OK, 1); + } +} diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h index 4c1e1a8..6264cd8 100644 --- a/fs/nfs/fscache.h +++ b/fs/nfs/fscache.h @@ -94,6 +94,7 @@ extern int __nfs_readpage_from_fscache(struct nfs_open_context *, extern int __nfs_readpages_from_fscache(struct nfs_open_context *, struct inode *, struct address_space *, struct list_head *, unsigned *); +extern void __nfs_readpage_to_fscache(struct inode *, struct page *, int); /* * release the caching state associated with a page if undergoing complete page @@ -133,6 +134,19 @@ static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx, return -ENOBUFS; } +/* + * Store a page newly fetched from the server in an inode data storage object + * in the cache. + */ +static inline void nfs_readpage_to_fscache(struct inode *inode, + struct page *page, + int sync) +{ + if (PageFsCache(page)) + __nfs_readpage_to_fscache(inode, page, sync); +} + + #else /* CONFIG_NFS_FSCACHE */ static inline int nfs_fscache_register(void) { return 0; } static inline void nfs_fscache_unregister(void) {} @@ -178,6 +192,8 @@ static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx, { return -ENOBUFS; } +static inline void nfs_readpage_to_fscache(struct inode *inode, + struct page *page, int sync) {} #endif /* CONFIG_NFS_FSCACHE */ #endif /* _NFS_FSCACHE_H */ diff --git a/fs/nfs/read.c b/fs/nfs/read.c index db27b26..e09bdf9 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -143,6 +143,11 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode, static void nfs_readpage_release(struct nfs_page *req) { + struct inode *d_inode = req-wb_context-path.dentry-d_inode; + + if (PageUptodate(req-wb_page)) + nfs_readpage_to_fscache(d_inode, req-wb_page, 0); + unlock_page(req-wb_page); dprintk(NFS: read done (%s/%Ld [EMAIL PROTECTED])\n, - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 32/37] NFS: Add read context retention for FS-Cache to call back with
Add read context retention so that FS-Cache can call back into NFS when a read operation on the cache fails EIO rather than reading data. This permits NFS to then fetch the data from the server instead using the appropriate security context. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/fscache-index.c | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c index eec8e7e..af9f06b 100644 --- a/fs/nfs/fscache-index.c +++ b/fs/nfs/fscache-index.c @@ -285,6 +285,30 @@ static void nfs_cache_inode_now_uncached(void *cookie_netfs_data) } /* + * Get an extra reference on a read context. + * - This function can be absent if the completion function doesn't require a + * context. + * - The read context is passed back to NFS in the event that a data read on the + * cache fails with EIO - in which case the server must be contacted to + * retrieve the data, which requires the read context for security. + */ +static void nfs_fh_get_context(void *cookie_netfs_data, void *context) +{ + get_nfs_open_context(context); +} + +/* + * Release an extra reference on a read context. + * - This function can be absent if the completion function doesn't require a + * context. + */ +static void nfs_fh_put_context(void *cookie_netfs_data, void *context) +{ + if (context) + put_nfs_open_context(context); +} + +/* * Define the inode object for FS-Cache. This is used to describe an inode * object to fscache_acquire_cookie(). It is keyed by the NFS file handle for * an inode. @@ -301,4 +325,6 @@ const struct fscache_cookie_def nfs_cache_inode_object_def = { .get_aux= nfs_cache_inode_get_aux, .check_aux = nfs_cache_inode_check_aux, .now_uncached = nfs_cache_inode_now_uncached, + .get_context= nfs_fh_get_context, + .put_context= nfs_fh_put_context, }; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 37/37] NFS: Add mount options to enable local caching on NFS
Add NFS mount options to allow the local caching support to be enabled. The attached patch makes it possible for the NFS filesystem to be told to make use of the network filesystem local caching service (FS-Cache). To be able to use this, a recent nfsutils package is required. There are three variant NFS mount options that can be added to a mount command to control caching for a mount. Only the last one specified takes effect: (*) Adding fsc will request caching. (*) Adding fsc=string will request caching and also specify a uniquifier. (*) Adding nofsc will disable caching. For example: mount warthog:/ /a -o fsc The cache of a particular superblock (NFS FSID) will be shared between all mounts of that volume, provided they have the same connection parameters and are not marked 'nosharecache'. Where it is otherwise impossible to distinguish superblocks because all the parameters are identical, but the 'nosharecache' option is supplied, a uniquifying string must be supplied, else only the first mount will be permitted to use the cache. If there's a key collision, then the second mount will disable caching and give a warning into the kernel log. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/client.c |2 ++ fs/nfs/internal.h |1 + fs/nfs/super.c| 25 + 3 files changed, 28 insertions(+), 0 deletions(-) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index d67d52f..8357f68 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -669,6 +669,7 @@ static int nfs_init_server(struct nfs_server *server, /* Initialise the client representation from the mount data */ server-flags = data-flags NFS_MOUNT_FLAGMASK; + server-options = data-options; if (data-rsize) server-rsize = nfs_block_size(data-rsize, NULL); @@ -1056,6 +1057,7 @@ static int nfs4_init_server(struct nfs_server *server, /* Initialise the client representation from the mount data */ server-flags = data-flags NFS_MOUNT_FLAGMASK; server-caps |= NFS_CAP_ATOMIC_OPEN; + server-options = data-options; if (data-rsize) server-rsize = nfs_block_size(data-rsize, NULL); diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index e49cb6e..f427b35 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -38,6 +38,7 @@ struct nfs_parsed_mount_data { int acregmin, acregmax, acdirmin, acdirmax; int namlen; + unsigned intoptions; unsigned intbsize; unsigned intauth_flavor_len; rpc_authflavor_tauth_flavors[1]; diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 79c4abe..4c513c6 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -76,6 +76,7 @@ enum { Opt_acl, Opt_noacl, Opt_rdirplus, Opt_nordirplus, Opt_sharecache, Opt_nosharecache, + Opt_fscache, Opt_nofscache, /* Mount options that take integer arguments */ Opt_port, @@ -92,6 +93,7 @@ enum { /* Mount options that take string arguments */ Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost, Opt_addr, Opt_mountaddr, Opt_clientaddr, + Opt_fscache_uniq, /* Mount options that are ignored */ Opt_userspace, Opt_deprecated, @@ -125,6 +127,9 @@ static match_table_t nfs_mount_option_tokens = { { Opt_nordirplus, nordirplus }, { Opt_sharecache, sharecache }, { Opt_nosharecache, nosharecache }, + { Opt_fscache, fsc }, + { Opt_fscache_uniq, fsc=%s }, + { Opt_nofscache, nofsc }, { Opt_port, port=%u }, { Opt_rsize, rsize=%u }, @@ -486,6 +491,8 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss, seq_printf(m, ,timeo=%lu, 10U * nfss-client-cl_timeout-to_initval / HZ); seq_printf(m, ,retrans=%u, nfss-client-cl_timeout-to_retries); seq_printf(m, ,sec=%s, nfs_pseudoflavour_to_name(nfss-client-cl_auth-au_flavor)); + if (nfss-options NFS_OPTION_FSCACHE) + seq_printf(m, ,fsc); } /* @@ -780,6 +787,24 @@ static int nfs_parse_mount_options(char *raw, case Opt_nosharecache: mnt-flags |= NFS_MOUNT_UNSHARED; break; + case Opt_fscache: + mnt-options |= NFS_OPTION_FSCACHE; + kfree(mnt-fscache_uniq); + mnt-fscache_uniq = NULL; + break; + case Opt_nofscache: + mnt-options = ~NFS_OPTION_FSCACHE; + kfree(mnt-fscache_uniq); + mnt-fscache_uniq = NULL; + break; + case Opt_fscache_uniq: + string = match_strdup(args); + if (!string) + goto
[PATCH 30/37] NFS: Add some new I/O event counters for FS-Cache events
Add some new NFS I/O event counters for FS-Cache events. They have to be added as byte counters because I may need to be able to increase the numbers by more than 1 at a time. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/iostat.h |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/fs/nfs/iostat.h b/fs/nfs/iostat.h index 6350ecb..0e3b170 100644 --- a/fs/nfs/iostat.h +++ b/fs/nfs/iostat.h @@ -60,6 +60,13 @@ enum nfs_stat_bytecounters { NFSIOS_SERVERWRITTENBYTES, NFSIOS_READPAGES, NFSIOS_WRITEPAGES, +#ifdef CONFIG_NFS_FSCACHE + NFSIOS_FSCACHE_READ_OK, + NFSIOS_FSCACHE_READ_FAIL, + NFSIOS_FSCACHE_WRITE_OK, + NFSIOS_FSCACHE_WRITE_FAIL, + NFSIOS_FSCACHE_UNCACHE, +#endif __NFSIOS_BYTESMAX, }; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/37] Permit filesystem local caching
Serge E. Hallyn [EMAIL PROTECTED] wrote: Seems *really* weird that every time you send this, patch 6 doesn't seem to reach me in any of my mailboxes... (did get it from the url you listed) It's the largest of the patches, so that's not entirely surprising. Hence why I included the URL to the tarball also. I'm sorry if I miss where you explicitly state this, but is it safe to assume, as perusing the patches suggests, that 1. tsk-sec never changes other than in task_alloc_security()? Correct. 2. tsk-act_as is only ever dereferenced from (a) current- That ought to be correct. except (b) in do_coredump? Actually, do_coredump() only deals with current-act_as. (thereby carefully avoiding locking issues) That's the idea. I'd still like to see some performance numbers. Not to object to these patches, just to make sure there's no need to try and optimize more of the dereferences away when they're not needed. I hope that the performance impact is minimal. The kernel should spend very little time looking at the security data. I'll try and get some though. Oh, manually copied from patch 6, I see you have in the task_security struct definition: kernel_cap_tcap_bset; /* ? */ That comment can be filled in with 'capability bounding set' (for this task and all its future descendents). Thanks. David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 37/37] NFS: Add mount options to enable local caching on NFS
Add NFS mount options to allow the local caching support to be enabled. The attached patch makes it possible for the NFS filesystem to be told to make use of the network filesystem local caching service (FS-Cache). To be able to use this, a recent nfsutils package is required. There are three variant NFS mount options that can be added to a mount command to control caching for a mount. Only the last one specified takes effect: (*) Adding fsc will request caching. (*) Adding fsc=string will request caching and also specify a uniquifier. (*) Adding nofsc will disable caching. For example: mount warthog:/ /a -o fsc The cache of a particular superblock (NFS FSID) will be shared between all mounts of that volume, provided they have the same connection parameters and are not marked 'nosharecache'. Where it is otherwise impossible to distinguish superblocks because all the parameters are identical, but the 'nosharecache' option is supplied, a uniquifying string must be supplied, else only the first mount will be permitted to use the cache. If there's a key collision, then the second mount will disable caching and give a warning into the kernel log. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/client.c |2 ++ fs/nfs/internal.h |1 + fs/nfs/super.c| 25 + 3 files changed, 28 insertions(+), 0 deletions(-) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index d67d52f..8357f68 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -669,6 +669,7 @@ static int nfs_init_server(struct nfs_server *server, /* Initialise the client representation from the mount data */ server-flags = data-flags NFS_MOUNT_FLAGMASK; + server-options = data-options; if (data-rsize) server-rsize = nfs_block_size(data-rsize, NULL); @@ -1056,6 +1057,7 @@ static int nfs4_init_server(struct nfs_server *server, /* Initialise the client representation from the mount data */ server-flags = data-flags NFS_MOUNT_FLAGMASK; server-caps |= NFS_CAP_ATOMIC_OPEN; + server-options = data-options; if (data-rsize) server-rsize = nfs_block_size(data-rsize, NULL); diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index e49cb6e..f427b35 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -38,6 +38,7 @@ struct nfs_parsed_mount_data { int acregmin, acregmax, acdirmin, acdirmax; int namlen; + unsigned intoptions; unsigned intbsize; unsigned intauth_flavor_len; rpc_authflavor_tauth_flavors[1]; diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 437c3dd..96082a2 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -76,6 +76,7 @@ enum { Opt_acl, Opt_noacl, Opt_rdirplus, Opt_nordirplus, Opt_sharecache, Opt_nosharecache, + Opt_fscache, Opt_nofscache, /* Mount options that take integer arguments */ Opt_port, @@ -92,6 +93,7 @@ enum { /* Mount options that take string arguments */ Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost, Opt_addr, Opt_mountaddr, Opt_clientaddr, + Opt_fscache_uniq, /* Mount options that are ignored */ Opt_userspace, Opt_deprecated, @@ -125,6 +127,9 @@ static match_table_t nfs_mount_option_tokens = { { Opt_nordirplus, nordirplus }, { Opt_sharecache, sharecache }, { Opt_nosharecache, nosharecache }, + { Opt_fscache, fsc }, + { Opt_fscache_uniq, fsc=%s }, + { Opt_nofscache, nofsc }, { Opt_port, port=%u }, { Opt_rsize, rsize=%u }, @@ -482,6 +487,8 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss, seq_printf(m, ,timeo=%lu, 10U * nfss-client-cl_timeout-to_initval / HZ); seq_printf(m, ,retrans=%u, nfss-client-cl_timeout-to_retries); seq_printf(m, ,sec=%s, nfs_pseudoflavour_to_name(nfss-client-cl_auth-au_flavor)); + if (nfss-options NFS_OPTION_FSCACHE) + seq_printf(m, ,fsc); } /* @@ -776,6 +783,24 @@ static int nfs_parse_mount_options(char *raw, case Opt_nosharecache: mnt-flags |= NFS_MOUNT_UNSHARED; break; + case Opt_fscache: + mnt-options |= NFS_OPTION_FSCACHE; + kfree(mnt-fscache_uniq); + mnt-fscache_uniq = NULL; + break; + case Opt_nofscache: + mnt-options = ~NFS_OPTION_FSCACHE; + kfree(mnt-fscache_uniq); + mnt-fscache_uniq = NULL; + break; + case Opt_fscache_uniq: + string = match_strdup(args); + if (!string) + goto
[PATCH 33/37] NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching
nfs_readpage_async() needs to be non-static so that it can be used as a fallback for the local on-disk caching should an EIO crop up when reading the cache. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/read.c |4 ++-- include/linux/nfs_fs.h |2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/nfs/read.c b/fs/nfs/read.c index 3d7d963..725a5a2 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -114,8 +114,8 @@ static void nfs_readpage_truncate_uninitialised_page(struct nfs_read_data *data) } } -static int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode, - struct page *page) +int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode, + struct page *page) { LIST_HEAD(one_request); struct nfs_page *new; diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index d9adb53..d1d545e 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -505,6 +505,8 @@ extern int nfs_readpages(struct file *, struct address_space *, struct list_head *, unsigned); extern int nfs_readpage_result(struct rpc_task *, struct nfs_read_data *); extern void nfs_readdata_release(void *data); +extern int nfs_readpage_async(struct nfs_open_context *, struct inode *, + struct page *); /* * Allocate nfs_read_data structures - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 31/37] NFS: FS-Cache page management
FS-Cache page management for NFS. This includes hooking the releasing and invalidation of pages marked with PG_fscache (aka PG_private_2) and waiting for completion of the write-to-cache flag (PG_fscache_write aka PG_owner_priv_2). Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/file.c| 17 + fs/nfs/fscache.c | 49 + fs/nfs/fscache.h | 22 ++ 3 files changed, 84 insertions(+), 4 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 26a073b..60db3ea 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -35,6 +35,7 @@ #include delegation.h #include internal.h #include iostat.h +#include fscache.h #define NFSDBG_FACILITYNFSDBG_FILE @@ -358,7 +359,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping, * Partially or wholly invalidate a page * - Release the private state associated with a page if undergoing complete * page invalidation - * - Called if either PG_private or PG_private_2 is set on the page + * - Called if either PG_private or PG_fscache is set on the page * - Caller holds page lock */ static void nfs_invalidate_page(struct page *page, unsigned long offset) @@ -367,30 +368,35 @@ static void nfs_invalidate_page(struct page *page, unsigned long offset) return; /* Cancel any unstarted writes on this page */ nfs_wb_page_cancel(page-mapping-host, page); + + nfs_fscache_invalidate_page(page, page-mapping-host); } /* * Attempt to release the private state associated with a page - * - Called if either PG_private or PG_private_2 is set on the page + * - Called if either PG_private or PG_fscache is set on the page * - Caller holds page lock * - Return true (may release page) or false (may not) */ static int nfs_release_page(struct page *page, gfp_t gfp) { /* If PagePrivate() is set, then the page is not freeable */ - return 0; + if (PagePrivate(page)) + return 0; + return nfs_fscache_release_page(page, gfp); } /* * Attempt to clear the private state associated with a page when an error * occurs that requires the cached contents of an inode to be written back or * destroyed - * - Called if either PG_private or PG_private_2 is set on the page + * - Called if either PG_private or fscache is set on the page * - Caller holds page lock * - Return 0 if successful, -error otherwise */ static int nfs_launder_page(struct page *page) { + wait_on_page_fscache_write(page); return nfs_wb_page(page-mapping-host, page); } @@ -422,6 +428,9 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct page *page) int ret = -EINVAL; struct address_space *mapping; + /* make sure the cache has finished storing the page */ + wait_on_page_fscache_write(page); + lock_page(page); mapping = page-mapping; if (mapping != vma-vm_file-f_path.dentry-d_inode-i_mapping) diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c index c0e0320..d475ff5 100644 --- a/fs/nfs/fscache.c +++ b/fs/nfs/fscache.c @@ -19,6 +19,7 @@ #include linux/seq_file.h #include internal.h +#include iostat.h #include fscache.h #define NFSDBG_FACILITYNFSDBG_FSCACHE @@ -297,3 +298,51 @@ void nfs_fscache_attr_changed(struct inode *inode) { fscache_attr_changed(NFS_I(inode)-fscache); } + +/* + * Release the caching state associated with a page, if the page isn't busy + * interacting with the cache. + * - Returns true (can release page) or false (page busy). + */ +int nfs_fscache_release_page(struct page *page, gfp_t gfp) +{ + if (PageFsCacheWrite(page)) { + if (!(gfp __GFP_WAIT)) + return 0; + wait_on_page_fscache_write(page); + } + + if (PageFsCache(page)) { + struct nfs_inode *nfsi = NFS_I(page-mapping-host); + + BUG_ON(!nfsi-fscache); + + dfprintk(FSCACHE, NFS: fscache releasepage (0x%p/0x%p/0x%p)\n, +nfsi-fscache, page, nfsi); + + fscache_uncache_page(nfsi-fscache, page); + nfs_add_stats(page-mapping-host, NFSIOS_FSCACHE_UNCACHE, 1); + } + + return 1; +} + +/* + * Release the caching state associated with a page if undergoing complete page + * invalidation. + */ +void __nfs_fscache_invalidate_page(struct page *page, struct inode *inode) +{ + struct nfs_inode *nfsi = NFS_I(inode); + + BUG_ON(!nfsi-fscache); + + dfprintk(FSCACHE, NFS: fscache invalidatepage (0x%p/0x%p/0x%p)\n, +nfsi-fscache, page, nfsi); + + wait_on_page_fscache_write(page); + + BUG_ON(!PageLocked(page)); + fscache_uncache_page(nfsi-fscache, page); + nfs_add_stats(page-mapping-host, NFSIOS_FSCACHE_UNCACHE, 1); +} diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h index d730ec8..1cb7d96
[PATCH 34/37] NFS: Read pages from FS-Cache into an NFS inode
Read pages from an FS-Cache data storage object representing an inode into an NFS inode. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/fscache.c | 112 ++ fs/nfs/fscache.h | 47 +++ fs/nfs/read.c| 18 + 3 files changed, 176 insertions(+), 1 deletions(-) diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c index d475ff5..438cc9b 100644 --- a/fs/nfs/fscache.c +++ b/fs/nfs/fscache.c @@ -344,5 +344,115 @@ void __nfs_fscache_invalidate_page(struct page *page, struct inode *inode) BUG_ON(!PageLocked(page)); fscache_uncache_page(nfsi-fscache, page); - nfs_add_stats(page-mapping-host, NFSIOS_FSCACHE_UNCACHE, 1); + nfs_add_stats(inode, NFSIOS_FSCACHE_UNCACHE, 1); +} + +/* + * Handle completion of a page being read from the cache. + * - Called in process (keventd) context. + */ +static void nfs_readpage_from_fscache_complete(struct page *page, + void *context, + int error) +{ + dfprintk(FSCACHE, +NFS: readpage_from_fscache_complete (0x%p/0x%p/%d)\n, +page, context, error); + + /* if the read completes with an error, we just unlock the page and let +* the VM reissue the readpage */ + if (!error) { + SetPageUptodate(page); + unlock_page(page); + } else { + error = nfs_readpage_async(context, page-mapping-host, page); + if (error) + unlock_page(page); + } +} + +/* + * Retrieve a page from fscache + */ +int __nfs_readpage_from_fscache(struct nfs_open_context *ctx, + struct inode *inode, struct page *page) +{ + int ret; + + dfprintk(FSCACHE, +NFS: readpage_from_fscache(fsc:%p/p:%p(i:%lx f:%lx)/0x%p)\n, +NFS_I(inode)-fscache, page, page-index, page-flags, inode); + + ret = fscache_read_or_alloc_page(NFS_I(inode)-fscache, +page, +nfs_readpage_from_fscache_complete, +ctx, +GFP_KERNEL); + + switch (ret) { + case 0: /* read BIO submitted (page in fscache) */ + dfprintk(FSCACHE, +NFS:readpage_from_fscache: BIO submitted\n); + nfs_add_stats(inode, NFSIOS_FSCACHE_READ_OK, 1); + return ret; + + case -ENOBUFS: /* inode not in cache */ + case -ENODATA: /* page not in cache */ + nfs_add_stats(inode, NFSIOS_FSCACHE_READ_FAIL, 1); + dfprintk(FSCACHE, +NFS:readpage_from_fscache %d\n, ret); + return 1; + + default: + dfprintk(FSCACHE, NFS:readpage_from_fscache %d\n, ret); + nfs_add_stats(inode, NFSIOS_FSCACHE_READ_FAIL, 1); + } + return ret; +} + +/* + * Retrieve a set of pages from fscache + */ +int __nfs_readpages_from_fscache(struct nfs_open_context *ctx, +struct inode *inode, +struct address_space *mapping, +struct list_head *pages, +unsigned *nr_pages) +{ + int ret, npages = *nr_pages; + + dfprintk(FSCACHE, NFS: nfs_getpages_from_fscache (0x%p/%u/0x%p)\n, +NFS_I(inode)-fscache, npages, inode); + + ret = fscache_read_or_alloc_pages(NFS_I(inode)-fscache, + mapping, pages, nr_pages, + nfs_readpage_from_fscache_complete, + ctx, + mapping_gfp_mask(mapping)); + if (*nr_pages npages) + nfs_add_stats(inode, NFSIOS_FSCACHE_READ_OK, npages); + if (*nr_pages 0) + nfs_add_stats(inode, NFSIOS_FSCACHE_READ_FAIL, *nr_pages); + + switch (ret) { + case 0: /* read submitted to the cache for all pages */ + BUG_ON(!list_empty(pages)); + BUG_ON(*nr_pages != 0); + dfprintk(FSCACHE, +NFS: nfs_getpages_from_fscache: submitted\n); + + return ret; + + case -ENOBUFS: /* some pages aren't cached and can't be */ + case -ENODATA: /* some pages aren't cached */ + dfprintk(FSCACHE, +NFS: nfs_getpages_from_fscache: no page: %d\n, ret); + return 1; + + default: + dfprintk(FSCACHE, +NFS: nfs_getpages_from_fscache: ret %d\n, ret); + } + + return ret; } diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h index 1cb7d96..4c1e1a8 100644 --- a/fs/nfs/fscache.h +++ b/fs/nfs/fscache.h @@ -89,6 +89,12
[PATCH 30/37] NFS: Add some new I/O event counters for FS-Cache events
Add some new NFS I/O event counters for FS-Cache events. They have to be added as byte counters because I may need to be able to increase the numbers by more than 1 at a time. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/iostat.h |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/fs/nfs/iostat.h b/fs/nfs/iostat.h index 6350ecb..0e3b170 100644 --- a/fs/nfs/iostat.h +++ b/fs/nfs/iostat.h @@ -60,6 +60,13 @@ enum nfs_stat_bytecounters { NFSIOS_SERVERWRITTENBYTES, NFSIOS_READPAGES, NFSIOS_WRITEPAGES, +#ifdef CONFIG_NFS_FSCACHE + NFSIOS_FSCACHE_READ_OK, + NFSIOS_FSCACHE_READ_FAIL, + NFSIOS_FSCACHE_WRITE_OK, + NFSIOS_FSCACHE_WRITE_FAIL, + NFSIOS_FSCACHE_UNCACHE, +#endif __NFSIOS_BYTESMAX, }; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 32/37] NFS: Add read context retention for FS-Cache to call back with
Add read context retention so that FS-Cache can call back into NFS when a read operation on the cache fails EIO rather than reading data. This permits NFS to then fetch the data from the server instead using the appropriate security context. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/fscache-index.c | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c index eec8e7e..af9f06b 100644 --- a/fs/nfs/fscache-index.c +++ b/fs/nfs/fscache-index.c @@ -285,6 +285,30 @@ static void nfs_cache_inode_now_uncached(void *cookie_netfs_data) } /* + * Get an extra reference on a read context. + * - This function can be absent if the completion function doesn't require a + * context. + * - The read context is passed back to NFS in the event that a data read on the + * cache fails with EIO - in which case the server must be contacted to + * retrieve the data, which requires the read context for security. + */ +static void nfs_fh_get_context(void *cookie_netfs_data, void *context) +{ + get_nfs_open_context(context); +} + +/* + * Release an extra reference on a read context. + * - This function can be absent if the completion function doesn't require a + * context. + */ +static void nfs_fh_put_context(void *cookie_netfs_data, void *context) +{ + if (context) + put_nfs_open_context(context); +} + +/* * Define the inode object for FS-Cache. This is used to describe an inode * object to fscache_acquire_cookie(). It is keyed by the NFS file handle for * an inode. @@ -301,4 +325,6 @@ const struct fscache_cookie_def nfs_cache_inode_object_def = { .get_aux= nfs_cache_inode_get_aux, .check_aux = nfs_cache_inode_check_aux, .now_uncached = nfs_cache_inode_now_uncached, + .get_context= nfs_fh_get_context, + .put_context= nfs_fh_put_context, }; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/37] Security: Change current-fs[ug]id to current_fs[ug]id()
Change current-fs[ug]id to current_fs[ug]id() so that fsgid and fsuid can be separated from the task_struct. Signed-off-by: David Howells [EMAIL PROTECTED] --- arch/ia64/kernel/perfmon.c|4 ++-- arch/powerpc/platforms/cell/spufs/inode.c |4 ++-- drivers/isdn/capi/capifs.c|4 ++-- drivers/usb/core/inode.c |4 ++-- fs/9p/fid.c |2 +- fs/9p/vfs_inode.c |4 ++-- fs/9p/vfs_super.c |4 ++-- fs/affs/inode.c |4 ++-- fs/anon_inodes.c |4 ++-- fs/attr.c |4 ++-- fs/bfs/dir.c |4 ++-- fs/cifs/cifsproto.h |2 +- fs/cifs/dir.c | 12 ++-- fs/cifs/inode.c |8 fs/cifs/misc.c|4 ++-- fs/coda/cache.c |6 +++--- fs/coda/upcall.c |4 ++-- fs/devpts/inode.c |4 ++-- fs/dquot.c|2 +- fs/exec.c |4 ++-- fs/ext2/balloc.c |2 +- fs/ext2/ialloc.c |4 ++-- fs/ext2/ioctl.c |2 +- fs/ext3/balloc.c |2 +- fs/ext3/ialloc.c |4 ++-- fs/ext4/balloc.c |2 +- fs/ext4/ialloc.c |4 ++-- fs/fuse/dev.c |4 ++-- fs/gfs2/inode.c | 10 +- fs/hfs/inode.c|4 ++-- fs/hfsplus/inode.c|4 ++-- fs/hpfs/namei.c | 24 fs/hugetlbfs/inode.c | 16 fs/jffs2/fs.c |4 ++-- fs/jfs/jfs_inode.c|4 ++-- fs/locks.c|2 +- fs/minix/bitmap.c |4 ++-- fs/namei.c|8 fs/nfsd/vfs.c |4 ++-- fs/ocfs2/dlm/dlmfs.c |8 fs/ocfs2/namei.c |4 ++-- fs/pipe.c |4 ++-- fs/posix_acl.c|4 ++-- fs/ramfs/inode.c |4 ++-- fs/reiserfs/namei.c |4 ++-- fs/sysv/ialloc.c |4 ++-- fs/udf/ialloc.c |4 ++-- fs/udf/namei.c|2 +- fs/ufs/ialloc.c |4 ++-- fs/xfs/linux-2.6/xfs_linux.h |4 ++-- fs/xfs/xfs_acl.c |6 +++--- fs/xfs/xfs_attr.c |2 +- fs/xfs/xfs_inode.c|4 ++-- fs/xfs/xfs_vnodeops.c |8 include/linux/fs.h|2 +- include/linux/sched.h |3 +++ ipc/mqueue.c |4 ++-- kernel/cgroup.c |4 ++-- mm/shmem.c|8 net/9p/client.c |2 +- net/socket.c |4 ++-- net/sunrpc/auth.c |8 security/commoncap.c |4 ++-- security/keys/key.c |2 +- security/keys/keyctl.c|2 +- security/keys/request_key.c | 10 +- security/keys/request_key_auth.c |2 +- 67 files changed, 160 insertions(+), 157 deletions(-) diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c index 78acd9f..9ef832c 100644 --- a/arch/ia64/kernel/perfmon.c +++ b/arch/ia64/kernel/perfmon.c @@ -2206,8 +2206,8 @@ pfm_alloc_fd(struct file **cfile) DPRINT((new inode ino=%ld @%p\n, inode-i_ino, inode)); inode-i_mode = S_IFCHR|S_IRUGO; - inode-i_uid = current-fsuid; - inode-i_gid = current-fsgid; + inode-i_uid = current_fsuid(); + inode-i_gid = current_fsgid(); sprintf(name, [%lu], inode-i_ino); this.name = name; diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c index 90784c0..0c3838c 100644 --- a/arch/powerpc/platforms/cell/spufs/inode.c +++ b/arch/powerpc/platforms/cell/spufs/inode.c @@ -85,8 +85,8 @@ spufs_new_inode(struct super_block *sb, int mode) goto out; inode-i_mode = mode; - inode-i_uid = current-fsuid; - inode-i_gid = current-fsgid; + inode-i_uid = current_fsuid(); + inode-i_gid
[PATCH 00/37] Permit filesystem local caching
These patches add local caching for network filesystems such as NFS. The patches can roughly be broken down into a number of sets: (*) 01-keys-inc-payload.diff (*) 02-keys-search-keyring.diff (*) 03-keys-callout-blob.diff Three patches to the keyring code made to help the CIFS people. Included because of patches 05-08. (*) 04-keys-get-label.diff A patch to allow the security label of a key to be retrieved. Included because of patches 05-08. (*) 05-security-current-fsugid.diff (*) 06-security-separate-task-bits.diff (*) 07-security-subjective.diff (*) 08-security-kernel_service-class.diff (*) 09-security-kernel-service.diff (*) 10-security-nfsd.diff Patches to permit the subjective security of a task to be overridden. All the security details in task_struct are decanted into a new struct that task_struct then has two pointers two: one that defines the objective security of that task (how other tasks may affect it) and one that defines the subjective security (how it may affect other objects). Note that I have dropped the idea of struct cred for the moment. With the amount of stuff that was excluded from it, it wasn't actually any use to me. However, it can be added later. Required for cachefiles. (*) 11-release-page.diff (*) 12-fscache-page-flags.diff (*) 13-add_wait_queue_tail.diff (*) 14-fscache.diff Patches to provide a local caching facility for network filesystems. (*) 15-cachefiles-ia64.diff (*) 16-cachefiles-ext3-f_mapping.diff (*) 17-cachefiles-write.diff (*) 18-cachefiles-monitor.diff (*) 19-cachefiles-export.diff (*) 20-cachefiles.diff Patches to provide a local cache in a directory of an already mounted filesystem. (*) 21-nfs-comment.diff (*) 22-nfs-fscache-option.diff (*) 23-nfs-fscache-kconfig.diff (*) 24-nfs-fscache-top-index.diff (*) 25-nfs-fscache-server-obj.diff (*) 26-nfs-fscache-super-obj.diff (*) 27-nfs-fscache-inode-obj.diff (*) 28-nfs-fscache-use-inode.diff (*) 29-nfs-fscache-invalidate-pages.diff (*) 30-nfs-fscache-iostats.diff (*) 31-nfs-fscache-page-management.diff (*) 32-nfs-fscache-read-context.diff (*) 33-nfs-fscache-read-fallback.diff (*) 34-nfs-fscache-read-from-cache.diff (*) 35-nfs-fscache-store-to-cache.diff (*) 36-nfs-fscache-mount.diff (*) 37-nfs-fscache-display.diff Patches to provide NFS with local caching. A couple of questions on the NFS iostat changes: (1) Should I update the iostat version number; (2) is it permitted to have conditional iostats? I've massively split up the NFS patches as requested by Trond Myklebust and Chuck Lever. I've also brought the patches up to date with the patch window turbulence. -- A tarball of the patches is available at: http://people.redhat.com/~dhowells/fscache/patches/nfs+fscache-29.tar.bz2 To use this version of CacheFiles, the cachefilesd-0.9 is also required. It is available as an SRPM: http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9-1.fc7.src.rpm Or as individual bits: http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9.tar.bz2 http://people.redhat.com/~dhowells/fscache/cachefilesd.fc http://people.redhat.com/~dhowells/fscache/cachefilesd.if http://people.redhat.com/~dhowells/fscache/cachefilesd.te http://people.redhat.com/~dhowells/fscache/cachefilesd.spec The .fc, .if and .te files are for manipulating SELinux. David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/37] Security: Add a kernel_service object class to SELinux
Add a 'kernel_service' object class to SELinux and give this object class two access vectors: 'use_as_override' and 'create_files_as'. The first vector is used to grant a process the right to nominate an alternate process security ID for the kernel to use as an override for the SELinux subjective security when accessing stuff on behalf of another process. For example, CacheFiles when accessing the cache on behalf on a process accessing an NFS file needs to use a subjective security ID appropriate to the cache rather then the one the calling process is using. The cachefilesd daemon will nominate the security ID to be used. The second vector is used to grant a process the right to nominate a file creation label for a kernel service to use. Signed-off-by: David Howells [EMAIL PROTECTED] --- security/selinux/include/av_perm_to_string.h |2 ++ security/selinux/include/av_permissions.h|2 ++ security/selinux/include/class_to_string.h |1 + security/selinux/include/flask.h |1 + 4 files changed, 6 insertions(+), 0 deletions(-) diff --git a/security/selinux/include/av_perm_to_string.h b/security/selinux/include/av_perm_to_string.h index 399f868..c68ec9c 100644 --- a/security/selinux/include/av_perm_to_string.h +++ b/security/selinux/include/av_perm_to_string.h @@ -168,3 +168,5 @@ S_(SECCLASS_DCCP_SOCKET, DCCP_SOCKET__NAME_CONNECT, name_connect) S_(SECCLASS_MEMPROTECT, MEMPROTECT__MMAP_ZERO, mmap_zero) S_(SECCLASS_PEER, PEER__RECV, recv) + S_(SECCLASS_KERNEL_SERVICE, KERNEL_SERVICE__USE_AS_OVERRIDE, use_as_override) + S_(SECCLASS_KERNEL_SERVICE, KERNEL_SERVICE__CREATE_FILES_AS, create_files_as) diff --git a/security/selinux/include/av_permissions.h b/security/selinux/include/av_permissions.h index 84c9abc..41cee9e 100644 --- a/security/selinux/include/av_permissions.h +++ b/security/selinux/include/av_permissions.h @@ -833,3 +833,5 @@ #define DCCP_SOCKET__NAME_CONNECT 0x0080UL #define MEMPROTECT__MMAP_ZERO 0x0001UL #define PEER__RECV0x0001UL +#define KERNEL_SERVICE__USE_AS_OVERRIDE 0x0001UL +#define KERNEL_SERVICE__CREATE_FILES_AS 0x0002UL diff --git a/security/selinux/include/class_to_string.h b/security/selinux/include/class_to_string.h index b1b0d1d..efe9efa 100644 --- a/security/selinux/include/class_to_string.h +++ b/security/selinux/include/class_to_string.h @@ -71,3 +71,4 @@ S_(NULL) S_(NULL) S_(peer) +S_(kernel_service) diff --git a/security/selinux/include/flask.h b/security/selinux/include/flask.h index 09e9dd2..2bc251a 100644 --- a/security/selinux/include/flask.h +++ b/security/selinux/include/flask.h @@ -51,6 +51,7 @@ #define SECCLASS_DCCP_SOCKET 60 #define SECCLASS_MEMPROTECT 61 #define SECCLASS_PEER68 +#define SECCLASS_KERNEL_SERVICE 69 /* * Security identifier indices for initial entities - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/37] KEYS: Allow the callout data to be passed as a blob rather than a string
Allow the callout data to be passed as a blob rather than a string for internal kernel services that call any request_key_*() interface other than request_key(). request_key() itself still takes a NUL-terminated string. The functions that change are: request_key_with_auxdata() request_key_async() request_key_async_with_auxdata() Signed-off-by: David Howells [EMAIL PROTECTED] --- Documentation/keys-request-key.txt | 11 +--- Documentation/keys.txt | 14 +++--- include/linux/key.h|9 --- security/keys/internal.h |9 --- security/keys/keyctl.c |7 - security/keys/request_key.c| 49 ++-- security/keys/request_key_auth.c | 12 + 7 files changed, 70 insertions(+), 41 deletions(-) diff --git a/Documentation/keys-request-key.txt b/Documentation/keys-request-key.txt index 266955d..09b55e4 100644 --- a/Documentation/keys-request-key.txt +++ b/Documentation/keys-request-key.txt @@ -11,26 +11,29 @@ request_key*(): struct key *request_key(const struct key_type *type, const char *description, - const char *callout_string); + const char *callout_info); or: struct key *request_key_with_auxdata(const struct key_type *type, const char *description, -const char *callout_string, +const char *callout_info, +size_t callout_len, void *aux); or: struct key *request_key_async(const struct key_type *type, const char *description, - const char *callout_string); + const char *callout_info, + size_t callout_len); or: struct key *request_key_async_with_auxdata(const struct key_type *type, const char *description, - const char *callout_string, + const char *callout_info, + size_t callout_len, void *aux); Or by userspace invoking the request_key system call: diff --git a/Documentation/keys.txt b/Documentation/keys.txt index 51652d3..b82d38d 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt @@ -771,7 +771,7 @@ payload contents for more information. struct key *request_key(const struct key_type *type, const char *description, - const char *callout_string); + const char *callout_info); This is used to request a key or keyring with a description that matches the description specified according to the key type's match function. This @@ -793,24 +793,28 @@ payload contents for more information. struct key *request_key_with_auxdata(const struct key_type *type, const char *description, -const char *callout_string, +const void *callout_info, +size_t callout_len, void *aux); This is identical to request_key(), except that the auxiliary data is -passed to the key_type-request_key() op if it exists. +passed to the key_type-request_key() op if it exists, and the callout_info +is a blob of length callout_len, if given (the length may be 0). (*) A key can be requested asynchronously by calling one of: struct key *request_key_async(const struct key_type *type, const char *description, - const char *callout_string); + const void *callout_info, + size_t callout_len); or: struct key *request_key_async_with_auxdata(const struct key_type *type, const char *description, - const char *callout_string, + const char *callout_info, + size_t callout_len, void *aux); which are asynchronous equivalents of request_key() and diff --git a/include/linux/key.h b/include/linux/key.h index a70b8a8..163f864 100644 --- a/include/linux/key.h +++ b/include/linux
[PATCH 01/37] KEYS: Increase the payload size when instantiating a key
Increase the size of a payload that can be used to instantiate a key in add_key() and keyctl_instantiate_key(). This permits huge CIFS SPNEGO blobs to be passed around. The limit is raised to 1MB. If kmalloc() can't allocate a buffer of sufficient size, vmalloc() will be tried instead. Signed-off-by: David Howells [EMAIL PROTECTED] --- security/keys/keyctl.c | 38 ++ 1 files changed, 30 insertions(+), 8 deletions(-) diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index d9ca15c..8ec8432 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -19,6 +19,7 @@ #include linux/capability.h #include linux/string.h #include linux/err.h +#include linux/vmalloc.h #include asm/uaccess.h #include internal.h @@ -62,9 +63,10 @@ asmlinkage long sys_add_key(const char __user *_type, char type[32], *description; void *payload; long ret; + bool vm; ret = -EINVAL; - if (plen 32767) + if (plen 1024 * 1024 - 1) goto error; /* draw all the data into kernel space */ @@ -81,11 +83,18 @@ asmlinkage long sys_add_key(const char __user *_type, /* pull the payload in if one was supplied */ payload = NULL; + vm = false; if (_payload) { ret = -ENOMEM; payload = kmalloc(plen, GFP_KERNEL); - if (!payload) - goto error2; + if (!payload) { + if (plen = PAGE_SIZE) + goto error2; + vm = true; + payload = vmalloc(plen); + if (!payload) + goto error2; + } ret = -EFAULT; if (copy_from_user(payload, _payload, plen) != 0) @@ -113,7 +122,10 @@ asmlinkage long sys_add_key(const char __user *_type, key_ref_put(keyring_ref); error3: - kfree(payload); + if (!vm) + kfree(payload); + else + vfree(payload); error2: kfree(description); error: @@ -821,9 +833,10 @@ long keyctl_instantiate_key(key_serial_t id, key_ref_t keyring_ref; void *payload; long ret; + bool vm = false; ret = -EINVAL; - if (plen 32767) + if (plen 1024 * 1024 - 1) goto error; /* the appropriate instantiation authorisation key must have been @@ -843,8 +856,14 @@ long keyctl_instantiate_key(key_serial_t id, if (_payload) { ret = -ENOMEM; payload = kmalloc(plen, GFP_KERNEL); - if (!payload) - goto error; + if (!payload) { + if (plen = PAGE_SIZE) + goto error; + vm = true; + payload = vmalloc(plen); + if (!payload) + goto error; + } ret = -EFAULT; if (copy_from_user(payload, _payload, plen) != 0) @@ -877,7 +896,10 @@ long keyctl_instantiate_key(key_serial_t id, } error2: - kfree(payload); + if (!vm) + kfree(payload); + else + vfree(payload); error: return ret; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 36/37] NFS: Display local caching state
Display the local caching state in /proc/fs/nfsfs/volumes. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/client.c |7 --- fs/nfs/fscache.h | 15 +++ 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index 51e9346..d67d52f 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -1451,7 +1451,7 @@ static int nfs_volume_list_show(struct seq_file *m, void *v) /* display header on line 1 */ if (v == nfs_volume_list) { - seq_puts(m, NV SERVER PORT DEV FSID\n); + seq_puts(m, NV SERVER PORT DEV FSID FSC\n); return 0; } /* display one transport per line on subsequent lines */ @@ -1465,12 +1465,13 @@ static int nfs_volume_list_show(struct seq_file *m, void *v) (unsigned long long) server-fsid.major, (unsigned long long) server-fsid.minor); - seq_printf(m, v%u %s %s %-7s %-17s\n, + seq_printf(m, v%u %s %s %-7s %-17s %s\n, clp-rpc_ops-version, rpc_peeraddr2str(clp-cl_rpcclient, RPC_DISPLAY_HEX_ADDR), rpc_peeraddr2str(clp-cl_rpcclient, RPC_DISPLAY_HEX_PORT), dev, - fsid); + fsid, + nfs_server_fscache_state(server)); return 0; } diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h index 6264cd8..5f7806f 100644 --- a/fs/nfs/fscache.h +++ b/fs/nfs/fscache.h @@ -146,6 +146,16 @@ static inline void nfs_readpage_to_fscache(struct inode *inode, __nfs_readpage_to_fscache(inode, page, sync); } +/* + * indicate the client caching state as readable text + */ +static inline const char *nfs_server_fscache_state(struct nfs_server *server) +{ + if (server-fscache (server-options NFS_OPTION_FSCACHE)) + return yes; + return no ; +} + #else /* CONFIG_NFS_FSCACHE */ static inline int nfs_fscache_register(void) { return 0; } @@ -195,5 +205,10 @@ static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx, static inline void nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync) {} +static inline const char *nfs_server_fscache_state(struct nfs_server *server) +{ + return no ; +} + #endif /* CONFIG_NFS_FSCACHE */ #endif /* _NFS_FSCACHE_H */ - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 35/37] NFS: Store pages from an NFS inode into a local cache
Store pages from an NFS inode into the cache data storage object associated with that inode. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/fscache.c | 26 ++ fs/nfs/fscache.h | 16 fs/nfs/read.c|5 + 3 files changed, 47 insertions(+), 0 deletions(-) diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c index 438cc9b..50ae70f 100644 --- a/fs/nfs/fscache.c +++ b/fs/nfs/fscache.c @@ -456,3 +456,29 @@ int __nfs_readpages_from_fscache(struct nfs_open_context *ctx, return ret; } + +/* + * Store a newly fetched page in fscache + * - PG_fscache must be set on the page + */ +void __nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync) +{ + int ret; + + dfprintk(FSCACHE, +NFS: readpage_to_fscache(fsc:%p/p:%p(i:%lx f:%lx)/%d)\n, +NFS_I(inode)-fscache, page, page-index, page-flags, sync); + + ret = fscache_write_page(NFS_I(inode)-fscache, page, GFP_KERNEL); + dfprintk(FSCACHE, +NFS: readpage_to_fscache: p:%p(i:%lu f:%lx) ret %d\n, +page, page-index, page-flags, ret); + + if (ret != 0) { + fscache_uncache_page(NFS_I(inode)-fscache, page); + nfs_add_stats(inode, NFSIOS_FSCACHE_WRITE_FAIL, 1); + nfs_add_stats(inode, NFSIOS_FSCACHE_UNCACHE, 1); + } else { + nfs_add_stats(inode, NFSIOS_FSCACHE_WRITE_OK, 1); + } +} diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h index 4c1e1a8..6264cd8 100644 --- a/fs/nfs/fscache.h +++ b/fs/nfs/fscache.h @@ -94,6 +94,7 @@ extern int __nfs_readpage_from_fscache(struct nfs_open_context *, extern int __nfs_readpages_from_fscache(struct nfs_open_context *, struct inode *, struct address_space *, struct list_head *, unsigned *); +extern void __nfs_readpage_to_fscache(struct inode *, struct page *, int); /* * release the caching state associated with a page if undergoing complete page @@ -133,6 +134,19 @@ static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx, return -ENOBUFS; } +/* + * Store a page newly fetched from the server in an inode data storage object + * in the cache. + */ +static inline void nfs_readpage_to_fscache(struct inode *inode, + struct page *page, + int sync) +{ + if (PageFsCache(page)) + __nfs_readpage_to_fscache(inode, page, sync); +} + + #else /* CONFIG_NFS_FSCACHE */ static inline int nfs_fscache_register(void) { return 0; } static inline void nfs_fscache_unregister(void) {} @@ -178,6 +192,8 @@ static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx, { return -ENOBUFS; } +static inline void nfs_readpage_to_fscache(struct inode *inode, + struct page *page, int sync) {} #endif /* CONFIG_NFS_FSCACHE */ #endif /* _NFS_FSCACHE_H */ diff --git a/fs/nfs/read.c b/fs/nfs/read.c index db27b26..e09bdf9 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -143,6 +143,11 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode, static void nfs_readpage_release(struct nfs_page *req) { + struct inode *d_inode = req-wb_context-path.dentry-d_inode; + + if (PageUptodate(req-wb_page)) + nfs_readpage_to_fscache(d_inode, req-wb_page, 0); + unlock_page(req-wb_page); dprintk(NFS: read done (%s/%Ld [EMAIL PROTECTED])\n, - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/37] CacheFiles: Add missing copy_page export for ia64
This one-line patch fixes the missing export of copy_page introduced by the cachefile patches. This patch is not yet upstream, but is required for cachefile on ia64. It will be pushed upstream when cachefile goes upstream. Signed-off-by: Prarit Bhargava [EMAIL PROTECTED] Signed-off-by: David Howells [EMAIL PROTECTED] --- arch/ia64/kernel/ia64_ksyms.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kernel/ia64_ksyms.c b/arch/ia64/kernel/ia64_ksyms.c index 8e7193d..3e544f4 100644 --- a/arch/ia64/kernel/ia64_ksyms.c +++ b/arch/ia64/kernel/ia64_ksyms.c @@ -46,6 +46,7 @@ EXPORT_SYMBOL(__do_clear_user); EXPORT_SYMBOL(__strlen_user); EXPORT_SYMBOL(__strncpy_from_user); EXPORT_SYMBOL(__strnlen_user); +EXPORT_SYMBOL(copy_page); /* from arch/ia64/lib */ extern void __divsi3(void); - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/37] CacheFiles: Permit the page lock state to be monitored
Add a function to install a monitor on the page lock waitqueue for a particular page, thus allowing the page being unlocked to be detected. This is used by CacheFiles to detect read completion on a page in the backing filesystem so that it can then copy the data to the waiting netfs page. Signed-off-by: David Howells [EMAIL PROTECTED] --- include/linux/pagemap.h |5 + mm/filemap.c| 18 ++ 2 files changed, 23 insertions(+), 0 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index d22e975..eb08fb8 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -242,6 +242,11 @@ static inline void wait_on_page_owner_priv_2(struct page *page) extern void end_page_owner_priv_2(struct page *page); /* + * Add an arbitrary waiter to a page's wait queue + */ +extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter); + +/* * Fault a userspace page into pagetables. Return non-zero on a fault. * * This assumes that two userspace pages are always sufficient. That's diff --git a/mm/filemap.c b/mm/filemap.c index 6c6cd76..5c0241c 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -548,6 +548,24 @@ void wait_on_page_bit(struct page *page, int bit_nr) EXPORT_SYMBOL(wait_on_page_bit); /** + * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue + * @page - Page defining the wait queue of interest + * @waiter - Waiter to add to the queue + * + * Add an arbitrary @waiter to the wait queue for the nominated @page. + */ +void add_page_wait_queue(struct page *page, wait_queue_t *waiter) +{ + wait_queue_head_t *q = page_waitqueue(page); + unsigned long flags; + + spin_lock_irqsave(q-lock, flags); + __add_wait_queue(q, waiter); + spin_unlock_irqrestore(q-lock, flags); +} +EXPORT_SYMBOL_GPL(add_page_wait_queue); + +/** * unlock_page - unlock a locked page * @page: the page * - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/37] CacheFiles: Be consistent about the use of mapping vs file-f_mapping in Ext3
Change all the usages of file-f_mapping in ext3_*write_end() functions to use the mapping argument directly. This has two consequences: (*) Consistency. Without this patch sometimes one is used and sometimes the other is. (*) A NULL file pointer can be passed. This feature is then made use of by the generic hook in the next patch, which is used by CacheFiles to write pages to a file without setting up a file struct. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/ext3/inode.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index eb95670..c976123 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -1215,7 +1215,7 @@ static int ext3_generic_write_end(struct file *file, loff_t pos, unsigned len, unsigned copied, struct page *page, void *fsdata) { - struct inode *inode = file-f_mapping-host; + struct inode *inode = mapping-host; copied = block_write_end(file, mapping, pos, len, copied, page, fsdata); @@ -1240,7 +1240,7 @@ static int ext3_ordered_write_end(struct file *file, struct page *page, void *fsdata) { handle_t *handle = ext3_journal_current_handle(); - struct inode *inode = file-f_mapping-host; + struct inode *inode = mapping-host; unsigned from, to; int ret = 0, ret2; @@ -1281,7 +1281,7 @@ static int ext3_writeback_write_end(struct file *file, struct page *page, void *fsdata) { handle_t *handle = ext3_journal_current_handle(); - struct inode *inode = file-f_mapping-host; + struct inode *inode = mapping-host; int ret = 0, ret2; loff_t new_i_size; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/37] Security: Allow kernel services to override LSM settings for task actions
Allow kernel services to override LSM settings appropriate to the actions performed by a task by duplicating a security record, modifying it and then using task_struct::act_as to point to it when performing operations on behalf of a task. This is used, for example, by CacheFiles which has to transparently access the cache on behalf of a process that thinks it is doing, say, NFS accesses with a potentially inappropriate (with respect to accessing the cache) set of security data. This patch provides two LSM hooks for modifying a task security record: (*) security_kernel_act_as() which allows modification of the security datum with which a task acts on other objects (most notably files). (*) security_create_files_as() which allows modification of the security datum that is used to initialise the security data on a file that a task creates. Signed-off-by: David Howells [EMAIL PROTECTED] --- include/linux/capability.h | 12 ++-- include/linux/cred.h| 23 +++ include/linux/security.h| 43 + kernel/cred.c | 112 +++ security/dummy.c| 17 + security/security.c | 15 - security/selinux/hooks.c| 51 security/selinux/include/security.h |2 - security/selinux/ss/services.c |5 +- 9 files changed, 265 insertions(+), 15 deletions(-) create mode 100644 include/linux/cred.h diff --git a/include/linux/capability.h b/include/linux/capability.h index 7d50ff6..424de01 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -364,12 +364,12 @@ typedef struct kernel_cap_struct { # error Fix up hand-coded capability macro initializers #else /* HAND-CODED capability initializers */ -# define CAP_EMPTY_SET{{ 0, 0 }} -# define CAP_FULL_SET {{ ~0, ~0 }} -# define CAP_INIT_EFF_SET {{ ~CAP_TO_MASK(CAP_SETPCAP), ~0 }} -# define CAP_FS_SET {{ CAP_FS_MASK_B0, CAP_FS_MASK_B1 } } -# define CAP_NFSD_SET {{ CAP_FS_MASK_B0|CAP_TO_MASK(CAP_SYS_RESOURCE), \ -CAP_FS_MASK_B1 } } +# define CAP_EMPTY_SET((kernel_cap_t){{ 0, 0 }}) +# define CAP_FULL_SET ((kernel_cap_t){{ ~0, ~0 }}) +# define CAP_INIT_EFF_SET ((kernel_cap_t){{ ~CAP_TO_MASK(CAP_SETPCAP), ~0 }}) +# define CAP_FS_SET ((kernel_cap_t){{ CAP_FS_MASK_B0, CAP_FS_MASK_B1 } }) +# define CAP_NFSD_SET ((kernel_cap_t){{ CAP_FS_MASK_B0|CAP_TO_MASK(CAP_SYS_RESOURCE), \ + CAP_FS_MASK_B1 } }) #endif /* _LINUX_CAPABILITY_U32S != 2 */ diff --git a/include/linux/cred.h b/include/linux/cred.h new file mode 100644 index 000..497af5b --- /dev/null +++ b/include/linux/cred.h @@ -0,0 +1,23 @@ +/* Credential management + * + * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. + * Written by David Howells ([EMAIL PROTECTED]) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +#ifndef _LINUX_CRED_H +#define _LINUX_CRED_H + +struct task_security; +struct inode; + +extern struct task_security *get_kernel_security(struct task_struct *); +extern int set_security_override(struct task_security *, u32); +extern int set_security_override_from_ctx(struct task_security *, const char *); +extern int change_create_files_as(struct task_security *, struct inode *); + +#endif /* _LINUX_CRED_H */ diff --git a/include/linux/security.h b/include/linux/security.h index 9bf93c7..1c17b91 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -568,6 +568,19 @@ struct request_sock; * Duplicate and attach the security structure currently attached to the * p-security field. * Return 0 if operation was successful. + * @task_kernel_act_as: + * Set the credentials for a kernel service to act as (subjective context). + * @p points to the task that nominated @secid. + * @sec points to the task security record to be modified. + * @secid specifies the security ID to be set + * Return 0 if successful. + * @task_create_files_as: + * Set the file creation context in a task security record to be the same + * as the objective context of the specified inode. + * @p points to the task that nominated @inode. + * @sec points to the task security record to be modified. + * @inode points to the inode to use as a reference. + * Return 0 if successful. * @task_setuid: * Check permission before setting one or more of the user identity * attributes of the current process. The @flags parameter indicates @@ -1342,6 +1355,11 @@ struct security_operations { int (*task_alloc_security) (struct task_struct *p); void (*task_free_security) (struct task_security *p); int
[PATCH 02/37] KEYS: Check starting keyring as part of search
Check the starting keyring as part of the search to (a) see if that is what we're searching for, and (b) to check it is still valid for searching. The scenario: User in process A does things that cause things to be created in its process session keyring. The user then does an su to another user and starts a new process, B. The two processes now share the same process session keyring. Process B does an NFS access which results in an upcall to gssd. When gssd attempts to instantiate the context key (to be linked into the process session keyring), it is denied access even though it has an authorization key. The order of calls is: keyctl_instantiate_key() lookup_user_key() (the default: case) search_process_keyrings(current) search_process_keyrings(rka-context) (recursive call) keyring_search_aux() keyring_search_aux() verifies the keys and keyrings underneath the top-level keyring it is given, but that top-level keyring is neither fully validated nor checked to see if it is the thing being searched for. This patch changes keyring_search_aux() to: 1) do more validation on the top keyring it is given and 2) check whether that top-level keyring is the thing being searched for Signed-off-by: Kevin Coffman [EMAIL PROTECTED] Signed-off-by: David Howells [EMAIL PROTECTED] --- security/keys/keyring.c | 35 +++ 1 files changed, 31 insertions(+), 4 deletions(-) diff --git a/security/keys/keyring.c b/security/keys/keyring.c index 88292e3..76b89b2 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -292,7 +292,7 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref, struct keyring_list *keylist; struct timespec now; - unsigned long possessed; + unsigned long possessed, kflags; struct key *keyring, *key; key_ref_t key_ref; long err; @@ -318,6 +318,32 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref, now = current_kernel_time(); err = -EAGAIN; sp = 0; + + /* firstly we should check to see if this top-level keyring is what we +* are looking for */ + key_ref = ERR_PTR(-EAGAIN); + kflags = keyring-flags; + if (keyring-type == type match(keyring, description)) { + key = keyring; + + /* check it isn't negative and hasn't expired or been +* revoked */ + if (kflags (1 KEY_FLAG_REVOKED)) + goto error_2; + if (key-expiry now.tv_sec = key-expiry) + goto error_2; + key_ref = ERR_PTR(-ENOKEY); + if (kflags (1 KEY_FLAG_NEGATIVE)) + goto error_2; + goto found; + } + + /* otherwise, the top keyring must not be revoked, expired, or +* negatively instantiated if we are to search it */ + key_ref = ERR_PTR(-EAGAIN); + if (kflags ((1 KEY_FLAG_REVOKED) | (1 KEY_FLAG_NEGATIVE)) || + (keyring-expiry now.tv_sec = keyring-expiry)) + goto error_2; /* start processing a new keyring */ descend: @@ -331,13 +357,14 @@ descend: /* iterate through the keys in this keyring first */ for (kix = 0; kix keylist-nkeys; kix++) { key = keylist-keys[kix]; + kflags = key-flags; /* ignore keys not of this type */ if (key-type != type) continue; /* skip revoked keys and expired keys */ - if (test_bit(KEY_FLAG_REVOKED, key-flags)) + if (kflags (1 KEY_FLAG_REVOKED)) continue; if (key-expiry now.tv_sec = key-expiry) @@ -352,8 +379,8 @@ descend: context, KEY_SEARCH) 0) continue; - /* we set a different error code if we find a negative key */ - if (test_bit(KEY_FLAG_NEGATIVE, key-flags)) { + /* we set a different error code if we pass a negative key */ + if (kflags (1 KEY_FLAG_NEGATIVE)) { err = -ENOKEY; continue; } - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 27/37] NFS: Define and create inode-level cache objects
Define and create inode-level cache data storage objects (as managed by nfs_inode structs). Each inode-level object is created in a superblock-level index object and is itself a data storage object into which pages from the inode are stored. The inode object key is the NFS file handle for the inode. The inode object is given coherency data to carry in the auxiliary data permitted by the cache. This is a sequence made up of: (1) i_mtime from the NFS inode. (2) i_ctime from the NFS inode. (3) i_size from the NFS inode. As the cache is a persistent cache, the auxiliary data is checked when a new NFS in-memory inode is set up that matches an already existing data storage object in the cache. If the coherency data is the same, the on-disk object is retained and used; if not, it is scrapped and a new one created. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/fscache-index.c | 112 fs/nfs/fscache.h |1 2 files changed, 113 insertions(+), 0 deletions(-) diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c index b5a52e3..c3c63fa 100644 --- a/fs/nfs/fscache-index.c +++ b/fs/nfs/fscache-index.c @@ -150,3 +150,115 @@ const struct fscache_cookie_def nfs_cache_super_index_def = { .type = FSCACHE_COOKIE_TYPE_INDEX, .get_key= nfs_super_get_key, }; + +/* + * Definition of the auxiliary data attached to NFS inode storage objects + * within the cache. + * + * The contents of this struct are recorded in the on-disk local cache in the + * auxiliary data attached to the data storage object backing an inode. This + * permits coherency to be managed when a new inode binds to an already extant + * cache object. + */ +struct nfs_cache_inode_auxdata { + struct timespec mtime; + struct timespec ctime; + loff_t size; +}; + +/* + * Generate a key to describe an NFS inode in an NFS server's index + */ +static uint16_t nfs_cache_inode_get_key(const void *cookie_netfs_data, + void *buffer, uint16_t bufmax) +{ + const struct nfs_inode *nfsi = cookie_netfs_data; + uint16_t nsize; + + /* use the inode's NFS filehandle as the key */ + nsize = nfsi-fh.size; + memcpy(buffer, nfsi-fh.data, nsize); + return nsize; +} + +/* + * Get certain file attributes from the netfs data + * - This function can be absent for an index + * - Not permitted to return an error + * - The netfs data from the cookie being used as the source is presented + */ +static void nfs_cache_inode_get_attr(const void *cookie_netfs_data, uint64_t *size) +{ + const struct nfs_inode *nfsi = cookie_netfs_data; + + *size = nfsi-vfs_inode.i_size; +} + +/* + * Get the auxiliary data from netfs data + * - This function can be absent if the index carries no state data + * - Should store the auxiliary data in the buffer + * - Should return the amount of amount stored + * - Not permitted to return an error + * - The netfs data from the cookie being used as the source is presented + */ +static uint16_t nfs_cache_inode_get_aux(const void *cookie_netfs_data, + void *buffer, uint16_t bufmax) +{ + struct nfs_cache_inode_auxdata auxdata; + const struct nfs_inode *nfsi = cookie_netfs_data; + + auxdata.size = nfsi-vfs_inode.i_size; + auxdata.mtime = nfsi-vfs_inode.i_mtime; + auxdata.ctime = nfsi-vfs_inode.i_ctime; + + if (bufmax sizeof(auxdata)) + bufmax = sizeof(auxdata); + + memcpy(buffer, auxdata, bufmax); + return bufmax; +} + +/* + * Consult the netfs about the state of an object + * - This function can be absent if the index carries no state data + * - The netfs data from the cookie being used as the target is + * presented, as is the auxiliary data + */ +static enum fscache_checkaux nfs_cache_inode_check_aux(void *cookie_netfs_data, + const void *data, + uint16_t datalen) +{ + struct nfs_cache_inode_auxdata auxdata; + struct nfs_inode *nfsi = cookie_netfs_data; + + if (datalen sizeof(auxdata)) + return FSCACHE_CHECKAUX_OBSOLETE; + + auxdata.size = nfsi-vfs_inode.i_size; + auxdata.mtime = nfsi-vfs_inode.i_mtime; + auxdata.ctime = nfsi-vfs_inode.i_ctime; + + if (memcmp(data, auxdata, datalen) != 0) + return FSCACHE_CHECKAUX_OBSOLETE; + + return FSCACHE_CHECKAUX_OKAY; +} + +/* + * Define the inode object for FS-Cache. This is used to describe an inode + * object to fscache_acquire_cookie(). It is keyed by the NFS file handle for + * an inode. + * + * Coherency is managed by comparing the copies of i_size, i_mtime and i_ctime + * held in the cache auxiliary data for the data storage object with those in + * the inode struct in memory. + */ +const struct
[PATCH 29/37] NFS: Invalidate FsCache page flags when cache removed
Invalidate the FsCache page flags on the pages belonging to an inode when the cache backing that NFS inode is removed. This allows a live cache to be withdrawn. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/fscache-index.c | 40 1 files changed, 40 insertions(+), 0 deletions(-) diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c index c3c63fa..eec8e7e 100644 --- a/fs/nfs/fscache-index.c +++ b/fs/nfs/fscache-index.c @@ -246,6 +246,45 @@ static enum fscache_checkaux nfs_cache_inode_check_aux(void *cookie_netfs_data, } /* + * Indication from FS-Cache that the cookie is no longer cached + * - This function is called when the backing store currently caching a cookie + * is removed + * - The netfs should use this to clean up any markers indicating cached pages + * - This is mandatory for any object that may have data + */ +static void nfs_cache_inode_now_uncached(void *cookie_netfs_data) +{ + struct nfs_inode *nfsi = cookie_netfs_data; + struct pagevec pvec; + pgoff_t first; + int loop, nr_pages; + + pagevec_init(pvec, 0); + first = 0; + + dprintk(NFS: nfs_inode_now_uncached: nfs_inode 0x%p\n, nfsi); + + for (;;) { + /* grab a bunch of pages to unmark */ + nr_pages = pagevec_lookup(pvec, + nfsi-vfs_inode.i_mapping, + first, + PAGEVEC_SIZE - pagevec_count(pvec)); + if (!nr_pages) + break; + + for (loop = 0; loop nr_pages; loop++) + ClearPageFsCache(pvec.pages[loop]); + + first = pvec.pages[nr_pages - 1]-index + 1; + + pvec.nr = nr_pages; + pagevec_release(pvec); + cond_resched(); + } +} + +/* * Define the inode object for FS-Cache. This is used to describe an inode * object to fscache_acquire_cookie(). It is keyed by the NFS file handle for * an inode. @@ -261,4 +300,5 @@ const struct fscache_cookie_def nfs_cache_inode_object_def = { .get_attr = nfs_cache_inode_get_attr, .get_aux= nfs_cache_inode_get_aux, .check_aux = nfs_cache_inode_check_aux, + .now_uncached = nfs_cache_inode_now_uncached, }; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 24/27] NFS: Use local caching [try #2]
Chuck Lever [EMAIL PROTECTED] wrote: +struct nfs_fh_auxdata { + struct timespec i_mtime; + struct timespec i_ctime; + loff_t i_size; +}; It might be useful to explain here why you need to supplement the mtime, ctime, and size fields that already exist in an NFS inode. Supplement? I don't understand. Why is it necessary to add additional mtime, ctime and size fields for NFS inodes? Similar metadata is already stored in nfsi. Yes, but this is the data that's stored in the cache on disk, not what's stored in the NFS inode struct in RAM. I'll add some more comments to the code to make this clearer. David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 05/26] mount options: fix afs
Miklos Szeredi [EMAIL PROTECTED] wrote: Add a .show_options super operation to afs. Use generic_show_options() and save the complete option string in afs_get_sb(). Sounds reasonable, but I can't test it till I get back from LCA. David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/27] KEYS: Check starting keyring as part of search [try #2]
Check the starting keyring as part of the search to (a) see if that is what we're searching for, and (b) to check it is still valid for searching. The scenario: User in process A does things that cause things to be created in its process session keyring. The user then does an su to another user and starts a new process, B. The two processes now share the same process session keyring. Process B does an NFS access which results in an upcall to gssd. When gssd attempts to instantiate the context key (to be linked into the process session keyring), it is denied access even though it has an authorization key. The order of calls is: keyctl_instantiate_key() lookup_user_key() (the default: case) search_process_keyrings(current) search_process_keyrings(rka-context) (recursive call) keyring_search_aux() keyring_search_aux() verifies the keys and keyrings underneath the top-level keyring it is given, but that top-level keyring is neither fully validated nor checked to see if it is the thing being searched for. This patch changes keyring_search_aux() to: 1) do more validation on the top keyring it is given and 2) check whether that top-level keyring is the thing being searched for Signed-off-by: Kevin Coffman [EMAIL PROTECTED] Signed-off-by: David Howells [EMAIL PROTECTED] --- security/keys/keyring.c | 35 +++ 1 files changed, 31 insertions(+), 4 deletions(-) diff --git a/security/keys/keyring.c b/security/keys/keyring.c index 88292e3..76b89b2 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -292,7 +292,7 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref, struct keyring_list *keylist; struct timespec now; - unsigned long possessed; + unsigned long possessed, kflags; struct key *keyring, *key; key_ref_t key_ref; long err; @@ -318,6 +318,32 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref, now = current_kernel_time(); err = -EAGAIN; sp = 0; + + /* firstly we should check to see if this top-level keyring is what we +* are looking for */ + key_ref = ERR_PTR(-EAGAIN); + kflags = keyring-flags; + if (keyring-type == type match(keyring, description)) { + key = keyring; + + /* check it isn't negative and hasn't expired or been +* revoked */ + if (kflags (1 KEY_FLAG_REVOKED)) + goto error_2; + if (key-expiry now.tv_sec = key-expiry) + goto error_2; + key_ref = ERR_PTR(-ENOKEY); + if (kflags (1 KEY_FLAG_NEGATIVE)) + goto error_2; + goto found; + } + + /* otherwise, the top keyring must not be revoked, expired, or +* negatively instantiated if we are to search it */ + key_ref = ERR_PTR(-EAGAIN); + if (kflags ((1 KEY_FLAG_REVOKED) | (1 KEY_FLAG_NEGATIVE)) || + (keyring-expiry now.tv_sec = keyring-expiry)) + goto error_2; /* start processing a new keyring */ descend: @@ -331,13 +357,14 @@ descend: /* iterate through the keys in this keyring first */ for (kix = 0; kix keylist-nkeys; kix++) { key = keylist-keys[kix]; + kflags = key-flags; /* ignore keys not of this type */ if (key-type != type) continue; /* skip revoked keys and expired keys */ - if (test_bit(KEY_FLAG_REVOKED, key-flags)) + if (kflags (1 KEY_FLAG_REVOKED)) continue; if (key-expiry now.tv_sec = key-expiry) @@ -352,8 +379,8 @@ descend: context, KEY_SEARCH) 0) continue; - /* we set a different error code if we find a negative key */ - if (test_bit(KEY_FLAG_NEGATIVE, key-flags)) { + /* we set a different error code if we pass a negative key */ + if (kflags (1 KEY_FLAG_NEGATIVE)) { err = -ENOKEY; continue; } - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/27] Permit filesystem local caching [try #2]
These patches add local caching for network filesystems such as NFS. The patches can roughly be broken down into a number of sets: (*) 01-keys-inc-payload.diff (*) 02-keys-search-keyring.diff (*) 03-keys-callout-blob.diff Three patches to the keyring code made to help the CIFS people. Included because of patches 05-08. (*) 04-keys-get-label.diff A patch to allow the security label of a key to be retrieved. Included because of patches 05-08. (*) 05-security-current-fsugid.diff (*) 06-security-separate-task-bits.diff (*) 07-security-subjective.diff (*) 08-security-secctx2secid.diff (*) 09-security-additional-classes.diff (*) 10-security-kernel_service-class.diff (*) 11-security-kernel-service.diff (*) 12-security-nfsd.diff Patches to permit the subjective security of a task to be overridden. All the security details in task_struct are decanted into a new struct that task_struct then has two pointers two: one that defines the objective security of that task (how other tasks may affect it) and one that defines the subjective security (how it may affect other objects). Note that I have dropped the idea of struct cred for the moment. With the amount of stuff that was excluded from it, it wasn't actually any use to me. However, it can be added later. Required for cachefiles. (*) 13-release-page.diff (*) 14-fscache-page-flags.diff (*) 15-add_wait_queue_tail.diff (*) 16-fscache.diff Patches to provide a local caching facility for network filesystems. (*) 17-cachefiles-ia64.diff (*) 18-cachefiles-ext3-f_mapping.diff (*) 19-cachefiles-write.diff (*) 20-cachefiles-monitor.diff (*) 21-cachefiles-export.diff (*) 22-cachefiles.diff Patches to provide a local cache in a directory of an already mounted filesystem. (*) 23-nfs-memleak.diff (*) 24-fscache-nfs.diff (*) 25-fscache-nfs-mount.diff (*) 26-fscache-nfs-display.diff (*) 27-fscache-nfs-persb.diff Patches to provide NFS with local caching. The fifth of these patches makes caching configurable per superblock. I've fixed some current-fs[ug]id conversions as pointed out by Jan Harkes. I also fixed patch 14 (fscache-page-flags.diff) to make wait_on_page_owner_priv_2() use PG_owner_priv_2 rather than PG_private_2. -- A tarball of the patches is available at: http://people.redhat.com/~dhowells/fscache/patches/nfs+fscache-28.tar.bz2 To use this version of CacheFiles, the cachefilesd-0.9 is also required. It is available as an SRPM: http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9-1.fc7.src.rpm Or as individual bits: http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9.tar.bz2 http://people.redhat.com/~dhowells/fscache/cachefilesd.fc http://people.redhat.com/~dhowells/fscache/cachefilesd.if http://people.redhat.com/~dhowells/fscache/cachefilesd.te http://people.redhat.com/~dhowells/fscache/cachefilesd.spec The .fc, .if and .te files are for manipulating SELinux. David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/27] KEYS: Allow the callout data to be passed as a blob rather than a string [try #2]
Allow the callout data to be passed as a blob rather than a string for internal kernel services that call any request_key_*() interface other than request_key(). request_key() itself still takes a NUL-terminated string. The functions that change are: request_key_with_auxdata() request_key_async() request_key_async_with_auxdata() Signed-off-by: David Howells [EMAIL PROTECTED] --- Documentation/keys-request-key.txt | 11 +--- Documentation/keys.txt | 14 +++--- include/linux/key.h|9 --- security/keys/internal.h |9 --- security/keys/keyctl.c |7 - security/keys/request_key.c| 49 ++-- security/keys/request_key_auth.c | 12 + 7 files changed, 70 insertions(+), 41 deletions(-) diff --git a/Documentation/keys-request-key.txt b/Documentation/keys-request-key.txt index 266955d..09b55e4 100644 --- a/Documentation/keys-request-key.txt +++ b/Documentation/keys-request-key.txt @@ -11,26 +11,29 @@ request_key*(): struct key *request_key(const struct key_type *type, const char *description, - const char *callout_string); + const char *callout_info); or: struct key *request_key_with_auxdata(const struct key_type *type, const char *description, -const char *callout_string, +const char *callout_info, +size_t callout_len, void *aux); or: struct key *request_key_async(const struct key_type *type, const char *description, - const char *callout_string); + const char *callout_info, + size_t callout_len); or: struct key *request_key_async_with_auxdata(const struct key_type *type, const char *description, - const char *callout_string, + const char *callout_info, + size_t callout_len, void *aux); Or by userspace invoking the request_key system call: diff --git a/Documentation/keys.txt b/Documentation/keys.txt index 51652d3..b82d38d 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt @@ -771,7 +771,7 @@ payload contents for more information. struct key *request_key(const struct key_type *type, const char *description, - const char *callout_string); + const char *callout_info); This is used to request a key or keyring with a description that matches the description specified according to the key type's match function. This @@ -793,24 +793,28 @@ payload contents for more information. struct key *request_key_with_auxdata(const struct key_type *type, const char *description, -const char *callout_string, +const void *callout_info, +size_t callout_len, void *aux); This is identical to request_key(), except that the auxiliary data is -passed to the key_type-request_key() op if it exists. +passed to the key_type-request_key() op if it exists, and the callout_info +is a blob of length callout_len, if given (the length may be 0). (*) A key can be requested asynchronously by calling one of: struct key *request_key_async(const struct key_type *type, const char *description, - const char *callout_string); + const void *callout_info, + size_t callout_len); or: struct key *request_key_async_with_auxdata(const struct key_type *type, const char *description, - const char *callout_string, + const char *callout_info, + size_t callout_len, void *aux); which are asynchronous equivalents of request_key() and diff --git a/include/linux/key.h b/include/linux/key.h index a70b8a8..163f864 100644 --- a/include/linux/key.h +++ b/include/linux
[PATCH 01/27] KEYS: Increase the payload size when instantiating a key [try #2]
Increase the size of a payload that can be used to instantiate a key in add_key() and keyctl_instantiate_key(). This permits huge CIFS SPNEGO blobs to be passed around. The limit is raised to 1MB. If kmalloc() can't allocate a buffer of sufficient size, vmalloc() will be tried instead. Signed-off-by: David Howells [EMAIL PROTECTED] --- security/keys/keyctl.c | 38 ++ 1 files changed, 30 insertions(+), 8 deletions(-) diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index d9ca15c..8ec8432 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -19,6 +19,7 @@ #include linux/capability.h #include linux/string.h #include linux/err.h +#include linux/vmalloc.h #include asm/uaccess.h #include internal.h @@ -62,9 +63,10 @@ asmlinkage long sys_add_key(const char __user *_type, char type[32], *description; void *payload; long ret; + bool vm; ret = -EINVAL; - if (plen 32767) + if (plen 1024 * 1024 - 1) goto error; /* draw all the data into kernel space */ @@ -81,11 +83,18 @@ asmlinkage long sys_add_key(const char __user *_type, /* pull the payload in if one was supplied */ payload = NULL; + vm = false; if (_payload) { ret = -ENOMEM; payload = kmalloc(plen, GFP_KERNEL); - if (!payload) - goto error2; + if (!payload) { + if (plen = PAGE_SIZE) + goto error2; + vm = true; + payload = vmalloc(plen); + if (!payload) + goto error2; + } ret = -EFAULT; if (copy_from_user(payload, _payload, plen) != 0) @@ -113,7 +122,10 @@ asmlinkage long sys_add_key(const char __user *_type, key_ref_put(keyring_ref); error3: - kfree(payload); + if (!vm) + kfree(payload); + else + vfree(payload); error2: kfree(description); error: @@ -821,9 +833,10 @@ long keyctl_instantiate_key(key_serial_t id, key_ref_t keyring_ref; void *payload; long ret; + bool vm = false; ret = -EINVAL; - if (plen 32767) + if (plen 1024 * 1024 - 1) goto error; /* the appropriate instantiation authorisation key must have been @@ -843,8 +856,14 @@ long keyctl_instantiate_key(key_serial_t id, if (_payload) { ret = -ENOMEM; payload = kmalloc(plen, GFP_KERNEL); - if (!payload) - goto error; + if (!payload) { + if (plen = PAGE_SIZE) + goto error; + vm = true; + payload = vmalloc(plen); + if (!payload) + goto error; + } ret = -EFAULT; if (copy_from_user(payload, _payload, plen) != 0) @@ -877,7 +896,10 @@ long keyctl_instantiate_key(key_serial_t id, } error2: - kfree(payload); + if (!vm) + kfree(payload); + else + vfree(payload); error: return ret; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/27] FS-Cache: Recruit a couple of page flags for cache management
David Howells [EMAIL PROTECTED] wrote: (2) PG_fscache_write (PG_owner_priv_2) The marked page is being written to the local cache. The page may not be modified whilst this is in progress. Oops. wait_on_page_owner_priv_2() should use PG_owner_priv_2 rather than PG_private_2. I'll release a new patchset shortly. David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/27] KEYS: Add keyctl function to get a security label [try #2]
Add a keyctl() function to get the security label of a key. The following is added to Documentation/keys.txt: (*) Get the LSM security context attached to a key. long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer, size_t buflen) This function returns a string that represents the LSM security context attached to a key in the buffer provided. Unless there's an error, it always returns the amount of data it could produce, even if that's too big for the buffer, but it won't copy more than requested to userspace. If the buffer pointer is NULL then no copy will take place. A NUL character is included at the end of the string if the buffer is sufficiently big. This is included in the returned count. If no LSM is in force then an empty string will be returned. A process must have view permission on the key for this function to be successful. Signed-off-by: David Howells [EMAIL PROTECTED] Acked-by: Stephen Smalley [EMAIL PROTECTED] --- Documentation/keys.txt | 21 +++ include/linux/keyctl.h |1 + include/linux/security.h | 20 +- security/dummy.c |8 ++ security/keys/compat.c |3 ++ security/keys/keyctl.c | 66 ++ security/security.c |5 +++ security/selinux/hooks.c | 21 +-- 8 files changed, 141 insertions(+), 4 deletions(-) diff --git a/Documentation/keys.txt b/Documentation/keys.txt index b82d38d..be424b0 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt @@ -711,6 +711,27 @@ The keyctl syscall functions are: The assumed authoritative key is inherited across fork and exec. + (*) Get the LSM security context attached to a key. + + long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer, + size_t buflen) + + This function returns a string that represents the LSM security context + attached to a key in the buffer provided. + + Unless there's an error, it always returns the amount of data it could + produce, even if that's too big for the buffer, but it won't copy more + than requested to userspace. If the buffer pointer is NULL then no copy + will take place. + + A NUL character is included at the end of the string if the buffer is + sufficiently big. This is included in the returned count. If no LSM is + in force then an empty string will be returned. + + A process must have view permission on the key for this function to be + successful. + + === KERNEL SERVICES === diff --git a/include/linux/keyctl.h b/include/linux/keyctl.h index 3365945..656ee6b 100644 --- a/include/linux/keyctl.h +++ b/include/linux/keyctl.h @@ -49,5 +49,6 @@ #define KEYCTL_SET_REQKEY_KEYRING 14 /* set default request-key keyring */ #define KEYCTL_SET_TIMEOUT 15 /* set key timeout */ #define KEYCTL_ASSUME_AUTHORITY16 /* assume request_key() authorisation */ +#define KEYCTL_GET_SECURITY17 /* get key security label */ #endif /* _LINUX_KEYCTL_H */ diff --git a/include/linux/security.h b/include/linux/security.h index ac05083..8d9e946 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -959,6 +959,17 @@ struct request_sock; * @perm describes the combination of permissions required of this key. * Return 1 if permission granted, 0 if permission denied and -ve it the * normal permissions model should be effected. + * @key_getsecurity: + * Get a textual representation of the security context attached to a key + * for the purposes of honouring KEYCTL_GETSECURITY. This function + * allocates the storage for the NUL-terminated string and the caller + * should free it. + * @key points to the key to be queried. + * @_buffer points to a pointer that should be set to point to the + * resulting string (if no label or an error occurs). + * Return the length of the string (including terminating NUL) or -ve if + * an error. + * May also return 0 (and a NULL buffer pointer) if there is no label. * * Security hooks affecting all System V IPC operations. * @@ -1437,7 +1448,7 @@ struct security_operations { int (*key_permission)(key_ref_t key_ref, struct task_struct *context, key_perm_t perm); - + int (*key_getsecurity)(struct key *key, char **_buffer); #endif /* CONFIG_KEYS */ }; @@ -2567,6 +2578,7 @@ int security_key_alloc(struct key *key, struct task_struct *tsk, unsigned long f void security_key_free(struct key *key); int security_key_permission(key_ref_t key_ref, struct task_struct *context, key_perm_t perm); +int security_key_getsecurity(struct key *key, char **_buffer); #else @@ -2588,6 +2600,12 @@ static inline int
[PATCH 05/27] Security: Change current-fs[ug]id to current_fs[ug]id() [try #2]
Change current-fs[ug]id to current_fs[ug]id() so that fsgid and fsuid can be separated from the task_struct. Signed-off-by: David Howells [EMAIL PROTECTED] --- arch/ia64/kernel/perfmon.c|4 ++-- arch/powerpc/platforms/cell/spufs/inode.c |4 ++-- drivers/isdn/capi/capifs.c|4 ++-- drivers/usb/core/inode.c |4 ++-- fs/9p/fid.c |2 +- fs/9p/vfs_inode.c |4 ++-- fs/9p/vfs_super.c |4 ++-- fs/affs/inode.c |4 ++-- fs/anon_inodes.c |4 ++-- fs/attr.c |4 ++-- fs/bfs/dir.c |4 ++-- fs/cifs/cifsproto.h |2 +- fs/cifs/dir.c | 12 ++-- fs/cifs/inode.c |8 fs/cifs/misc.c|4 ++-- fs/coda/cache.c |6 +++--- fs/coda/upcall.c |4 ++-- fs/devpts/inode.c |4 ++-- fs/dquot.c|2 +- fs/exec.c |4 ++-- fs/ext2/balloc.c |2 +- fs/ext2/ialloc.c |4 ++-- fs/ext2/ioctl.c |2 +- fs/ext3/balloc.c |2 +- fs/ext3/ialloc.c |4 ++-- fs/ext4/balloc.c |2 +- fs/ext4/ialloc.c |4 ++-- fs/fuse/dev.c |4 ++-- fs/gfs2/inode.c | 10 +- fs/hfs/inode.c|4 ++-- fs/hfsplus/inode.c|4 ++-- fs/hpfs/namei.c | 24 fs/hugetlbfs/inode.c | 16 fs/jffs2/fs.c |4 ++-- fs/jfs/jfs_inode.c|4 ++-- fs/locks.c|2 +- fs/minix/bitmap.c |4 ++-- fs/namei.c|8 fs/nfsd/vfs.c |4 ++-- fs/ocfs2/dlm/dlmfs.c |8 fs/ocfs2/namei.c |4 ++-- fs/pipe.c |4 ++-- fs/posix_acl.c|4 ++-- fs/ramfs/inode.c |4 ++-- fs/reiserfs/namei.c |4 ++-- fs/sysv/ialloc.c |4 ++-- fs/udf/ialloc.c |4 ++-- fs/udf/namei.c|2 +- fs/ufs/ialloc.c |4 ++-- fs/xfs/linux-2.6/xfs_linux.h |4 ++-- fs/xfs/xfs_acl.c |6 +++--- fs/xfs/xfs_attr.c |2 +- fs/xfs/xfs_inode.c|6 +++--- fs/xfs/xfs_vnodeops.c |8 include/linux/fs.h|2 +- include/linux/sched.h |3 +++ ipc/mqueue.c |4 ++-- kernel/cgroup.c |4 ++-- mm/shmem.c|8 net/9p/client.c |2 +- net/socket.c |4 ++-- net/sunrpc/auth.c |8 security/commoncap.c |4 ++-- security/keys/key.c |2 +- security/keys/keyctl.c|2 +- security/keys/request_key.c | 10 +- security/keys/request_key_auth.c |2 +- 67 files changed, 161 insertions(+), 158 deletions(-) diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c index 73e7c2e..ef383d9 100644 --- a/arch/ia64/kernel/perfmon.c +++ b/arch/ia64/kernel/perfmon.c @@ -2206,8 +2206,8 @@ pfm_alloc_fd(struct file **cfile) DPRINT((new inode ino=%ld @%p\n, inode-i_ino, inode)); inode-i_mode = S_IFCHR|S_IRUGO; - inode-i_uid = current-fsuid; - inode-i_gid = current-fsgid; + inode-i_uid = current_fsuid(); + inode-i_gid = current_fsgid(); sprintf(name, [%lu], inode-i_ino); this.name = name; diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c index c0e968a..4efe7bf 100644 --- a/arch/powerpc/platforms/cell/spufs/inode.c +++ b/arch/powerpc/platforms/cell/spufs/inode.c @@ -85,8 +85,8 @@ spufs_new_inode(struct super_block *sb, int mode) goto out; inode-i_mode = mode; - inode-i_uid = current-fsuid; - inode-i_gid = current-fsgid; + inode-i_uid = current_fsuid(); + inode-i_gid
[PATCH 08/27] Add a secctx_to_secid() LSM hook to go along with the existing [try #2]
secid_to_secctx() LSM hook. This patch also includes the SELinux implementation for this hook. Signed-off-by: Paul Moore [EMAIL PROTECTED] Acked-by: Stephen Smalley [EMAIL PROTECTED] --- include/linux/security.h | 13 + security/dummy.c |6 ++ security/security.c |6 ++ security/selinux/hooks.c |6 ++ 4 files changed, 31 insertions(+), 0 deletions(-) diff --git a/include/linux/security.h b/include/linux/security.h index b7ba073..e8f2f2d 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1200,6 +1200,10 @@ struct request_sock; * Convert secid to security context. * @secid contains the security ID. * @secdata contains the pointer that stores the converted security context. + * @secctx_to_secid: + * Convert security context to secid. + * @secid contains the pointer to the generated security ID. + * @secdata contains the security context. * * @release_secctx: * Release the security context. @@ -1389,6 +1393,7 @@ struct security_operations { int (*getprocattr)(struct task_struct *p, char *name, char **value); int (*setprocattr)(struct task_struct *p, char *name, void *value, size_t size); int (*secid_to_secctx)(u32 secid, char **secdata, u32 *seclen); + int (*secctx_to_secid)(char *secdata, u32 seclen, u32 *secid); void (*release_secctx)(char *secdata, u32 seclen); #ifdef CONFIG_SECURITY_NETWORK @@ -1623,6 +1628,7 @@ int security_setprocattr(struct task_struct *p, char *name, void *value, size_t int security_netlink_send(struct sock *sk, struct sk_buff *skb); int security_netlink_recv(struct sk_buff *skb, int cap); int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen); +int security_secctx_to_secid(char *secdata, u32 seclen, u32 *secid); void security_release_secctx(char *secdata, u32 seclen); #else /* CONFIG_SECURITY */ @@ -2305,6 +2311,13 @@ static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 *secle return -EOPNOTSUPP; } +static inline int security_secctx_to_secid(char *secdata, + u32 seclen, + u32 *secid) +{ + return -EOPNOTSUPP; +} + static inline void security_release_secctx(char *secdata, u32 seclen) { } diff --git a/security/dummy.c b/security/dummy.c index 6f97089..72f1666 100644 --- a/security/dummy.c +++ b/security/dummy.c @@ -943,6 +943,11 @@ static int dummy_secid_to_secctx(u32 secid, char **secdata, u32 *seclen) return -EOPNOTSUPP; } +static int dummy_secctx_to_secid(char *secdata, u32 seclen, u32 *secid) +{ + return -EOPNOTSUPP; +} + static void dummy_release_secctx(char *secdata, u32 seclen) { } @@ -1109,6 +1114,7 @@ void security_fixup_ops (struct security_operations *ops) set_to_dummy_if_null(ops, getprocattr); set_to_dummy_if_null(ops, setprocattr); set_to_dummy_if_null(ops, secid_to_secctx); + set_to_dummy_if_null(ops, secctx_to_secid); set_to_dummy_if_null(ops, release_secctx); #ifdef CONFIG_SECURITY_NETWORK set_to_dummy_if_null(ops, unix_stream_connect); diff --git a/security/security.c b/security/security.c index 92d66d6..1ef4908 100644 --- a/security/security.c +++ b/security/security.c @@ -821,6 +821,12 @@ int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen) } EXPORT_SYMBOL(security_secid_to_secctx); +int security_secctx_to_secid(char *secdata, u32 seclen, u32 *secid) +{ + return security_ops-secctx_to_secid(secdata, seclen, secid); +} +EXPORT_SYMBOL(security_secctx_to_secid); + void security_release_secctx(char *secdata, u32 seclen) { return security_ops-release_secctx(secdata, seclen); diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 20a6b55..1d3eab7 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -4734,6 +4734,11 @@ static int selinux_secid_to_secctx(u32 secid, char **secdata, u32 *seclen) return security_sid_to_context(secid, secdata, seclen); } +static int selinux_secctx_to_secid(char *secdata, u32 seclen, u32 *secid) +{ + return security_context_to_sid(secdata, seclen, secid); +} + static void selinux_release_secctx(char *secdata, u32 seclen) { kfree(secdata); @@ -4937,6 +4942,7 @@ static struct security_operations selinux_ops = { .setprocattr = selinux_setprocattr, .secid_to_secctx = selinux_secid_to_secctx, + .secctx_to_secid = selinux_secctx_to_secid, .release_secctx = selinux_release_secctx, .unix_stream_connect = selinux_socket_unix_stream_connect, - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/27] Security: Pre-add additional non-caching classes [try #2]
Pre-add additional non-caching classes that are in the SELinux upstream repository, but not in the upstream kernel so they don't get in the fscache class patch. Signed-off-by: David Howells [EMAIL PROTECTED] --- security/selinux/include/av_perm_to_string.h |5 + security/selinux/include/av_permissions.h|5 + security/selinux/include/class_to_string.h |7 +++ security/selinux/include/flask.h |1 + 4 files changed, 18 insertions(+), 0 deletions(-) diff --git a/security/selinux/include/av_perm_to_string.h b/security/selinux/include/av_perm_to_string.h index 049bf69..caa0634 100644 --- a/security/selinux/include/av_perm_to_string.h +++ b/security/selinux/include/av_perm_to_string.h @@ -37,6 +37,8 @@ S_(SECCLASS_NODE, NODE__ENFORCE_DEST, enforce_dest) S_(SECCLASS_NODE, NODE__DCCP_RECV, dccp_recv) S_(SECCLASS_NODE, NODE__DCCP_SEND, dccp_send) + S_(SECCLASS_NODE, NODE__RECVFROM, recvfrom) + S_(SECCLASS_NODE, NODE__SENDTO, sendto) S_(SECCLASS_NETIF, NETIF__TCP_RECV, tcp_recv) S_(SECCLASS_NETIF, NETIF__TCP_SEND, tcp_send) S_(SECCLASS_NETIF, NETIF__UDP_RECV, udp_recv) @@ -45,6 +47,8 @@ S_(SECCLASS_NETIF, NETIF__RAWIP_SEND, rawip_send) S_(SECCLASS_NETIF, NETIF__DCCP_RECV, dccp_recv) S_(SECCLASS_NETIF, NETIF__DCCP_SEND, dccp_send) + S_(SECCLASS_NETIF, NETIF__INGRESS, ingress) + S_(SECCLASS_NETIF, NETIF__EGRESS, egress) S_(SECCLASS_UNIX_STREAM_SOCKET, UNIX_STREAM_SOCKET__CONNECTTO, connectto) S_(SECCLASS_UNIX_STREAM_SOCKET, UNIX_STREAM_SOCKET__NEWCONN, newconn) S_(SECCLASS_UNIX_STREAM_SOCKET, UNIX_STREAM_SOCKET__ACCEPTFROM, acceptfrom) @@ -159,3 +163,4 @@ S_(SECCLASS_DCCP_SOCKET, DCCP_SOCKET__NODE_BIND, node_bind) S_(SECCLASS_DCCP_SOCKET, DCCP_SOCKET__NAME_CONNECT, name_connect) S_(SECCLASS_MEMPROTECT, MEMPROTECT__MMAP_ZERO, mmap_zero) + S_(SECCLASS_PEER, PEER__RECV, recv) diff --git a/security/selinux/include/av_permissions.h b/security/selinux/include/av_permissions.h index eda89a2..c2b5bb2 100644 --- a/security/selinux/include/av_permissions.h +++ b/security/selinux/include/av_permissions.h @@ -292,6 +292,8 @@ #define NODE__ENFORCE_DEST0x0040UL #define NODE__DCCP_RECV 0x0080UL #define NODE__DCCP_SEND 0x0100UL +#define NODE__RECVFROM0x0200UL +#define NODE__SENDTO 0x0400UL #define NETIF__TCP_RECV 0x0001UL #define NETIF__TCP_SEND 0x0002UL #define NETIF__UDP_RECV 0x0004UL @@ -300,6 +302,8 @@ #define NETIF__RAWIP_SEND 0x0020UL #define NETIF__DCCP_RECV 0x0040UL #define NETIF__DCCP_SEND 0x0080UL +#define NETIF__INGRESS0x0100UL +#define NETIF__EGRESS 0x0200UL #define NETLINK_SOCKET__IOCTL 0x0001UL #define NETLINK_SOCKET__READ 0x0002UL #define NETLINK_SOCKET__WRITE 0x0004UL @@ -824,3 +828,4 @@ #define DCCP_SOCKET__NODE_BIND0x0040UL #define DCCP_SOCKET__NAME_CONNECT 0x0080UL #define MEMPROTECT__MMAP_ZERO 0x0001UL +#define PEER__RECV0x0001UL diff --git a/security/selinux/include/class_to_string.h b/security/selinux/include/class_to_string.h index e77de0e..b1b0d1d 100644 --- a/security/selinux/include/class_to_string.h +++ b/security/selinux/include/class_to_string.h @@ -64,3 +64,10 @@ S_(NULL) S_(dccp_socket) S_(memprotect) +S_(NULL) +S_(NULL) +S_(NULL) +S_(NULL) +S_(NULL) +S_(NULL) +S_(peer) diff --git a/security/selinux/include/flask.h b/security/selinux/include/flask.h index a9c2b20..09e9dd2 100644 --- a/security/selinux/include/flask.h +++ b/security/selinux/include/flask.h @@ -50,6 +50,7 @@ #define SECCLASS_KEY 58 #define SECCLASS_DCCP_SOCKET 60 #define SECCLASS_MEMPROTECT 61 +#define SECCLASS_PEER68 /* * Security identifier indices for initial entities - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/27] Security: Allow kernel services to override LSM settings for task actions [try #2]
Allow kernel services to override LSM settings appropriate to the actions performed by a task by duplicating a security record, modifying it and then using task_struct::act_as to point to it when performing operations on behalf of a task. This is used, for example, by CacheFiles which has to transparently access the cache on behalf of a process that thinks it is doing, say, NFS accesses with a potentially inappropriate (with respect to accessing the cache) set of security data. This patch provides two LSM hooks for modifying a task security record: (*) security_kernel_act_as() which allows modification of the security datum with which a task acts on other objects (most notably files). (*) security_create_files_as() which allows modification of the security datum that is used to initialise the security data on a file that a task creates. Signed-off-by: David Howells [EMAIL PROTECTED] --- include/linux/cred.h| 23 +++ include/linux/security.h| 43 +- kernel/cred.c | 111 +++ security/dummy.c| 17 + security/security.c | 15 - security/selinux/hooks.c| 51 security/selinux/include/security.h |2 - security/selinux/ss/services.c |5 +- 8 files changed, 258 insertions(+), 9 deletions(-) create mode 100644 include/linux/cred.h diff --git a/include/linux/cred.h b/include/linux/cred.h new file mode 100644 index 000..497af5b --- /dev/null +++ b/include/linux/cred.h @@ -0,0 +1,23 @@ +/* Credential management + * + * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. + * Written by David Howells ([EMAIL PROTECTED]) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +#ifndef _LINUX_CRED_H +#define _LINUX_CRED_H + +struct task_security; +struct inode; + +extern struct task_security *get_kernel_security(struct task_struct *); +extern int set_security_override(struct task_security *, u32); +extern int set_security_override_from_ctx(struct task_security *, const char *); +extern int change_create_files_as(struct task_security *, struct inode *); + +#endif /* _LINUX_CRED_H */ diff --git a/include/linux/security.h b/include/linux/security.h index e8f2f2d..e6be746 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -557,6 +557,19 @@ struct request_sock; * Duplicate and attach the security structure currently attached to the * p-security field. * Return 0 if operation was successful. + * @task_kernel_act_as: + * Set the credentials for a kernel service to act as (subjective context). + * @p points to the task that nominated @secid. + * @sec points to the task security record to be modified. + * @secid specifies the security ID to be set + * Return 0 if successful. + * @task_create_files_as: + * Set the file creation context in a task security record to be the same + * as the objective context of the specified inode. + * @p points to the task that nominated @inode. + * @sec points to the task security record to be modified. + * @inode points to the inode to use as a reference. + * Return 0 if successful. * @task_setuid: * Check permission before setting one or more of the user identity * attributes of the current process. The @flags parameter indicates @@ -1325,6 +1338,11 @@ struct security_operations { int (*task_alloc_security) (struct task_struct *p); void (*task_free_security) (struct task_security *p); int (*task_dup_security) (struct task_security *p); + int (*task_kernel_act_as)(struct task_struct *p, + struct task_security *sec, u32 secid); + int (*task_create_files_as)(struct task_struct *p, + struct task_security *sec, + struct inode *inode); int (*task_setuid) (uid_t id0, uid_t id1, uid_t id2, int flags); int (*task_post_setuid) (uid_t old_ruid /* or fsuid */ , uid_t old_euid, uid_t old_suid, int flags); @@ -1393,7 +1411,7 @@ struct security_operations { int (*getprocattr)(struct task_struct *p, char *name, char **value); int (*setprocattr)(struct task_struct *p, char *name, void *value, size_t size); int (*secid_to_secctx)(u32 secid, char **secdata, u32 *seclen); - int (*secctx_to_secid)(char *secdata, u32 seclen, u32 *secid); + int (*secctx_to_secid)(const char *secdata, u32 seclen, u32 *secid); void (*release_secctx)(char *secdata, u32 seclen); #ifdef CONFIG_SECURITY_NETWORK @@ -1576,6 +1594,11 @@ int security_task_create(unsigned long clone_flags
[PATCH 10/27] Security: Add a kernel_service object class to SELinux [try #2]
Add a 'kernel_service' object class to SELinux and give this object class two access vectors: 'use_as_override' and 'create_files_as'. The first vector is used to grant a process the right to nominate an alternate process security ID for the kernel to use as an override for the SELinux subjective security when accessing stuff on behalf of another process. For example, CacheFiles when accessing the cache on behalf on a process accessing an NFS file needs to use a subjective security ID appropriate to the cache rather then the one the calling process is using. The cachefilesd daemon will nominate the security ID to be used. The second vector is used to grant a process the right to nominate a file creation label for a kernel service to use. Signed-off-by: David Howells [EMAIL PROTECTED] --- security/selinux/include/av_perm_to_string.h |2 ++ security/selinux/include/av_permissions.h|2 ++ security/selinux/include/class_to_string.h |1 + security/selinux/include/flask.h |1 + 4 files changed, 6 insertions(+), 0 deletions(-) diff --git a/security/selinux/include/av_perm_to_string.h b/security/selinux/include/av_perm_to_string.h index caa0634..6ba8200 100644 --- a/security/selinux/include/av_perm_to_string.h +++ b/security/selinux/include/av_perm_to_string.h @@ -164,3 +164,5 @@ S_(SECCLASS_DCCP_SOCKET, DCCP_SOCKET__NAME_CONNECT, name_connect) S_(SECCLASS_MEMPROTECT, MEMPROTECT__MMAP_ZERO, mmap_zero) S_(SECCLASS_PEER, PEER__RECV, recv) + S_(SECCLASS_KERNEL_SERVICE, KERNEL_SERVICE__USE_AS_OVERRIDE, use_as_override) + S_(SECCLASS_KERNEL_SERVICE, KERNEL_SERVICE__CREATE_FILES_AS, create_files_as) diff --git a/security/selinux/include/av_permissions.h b/security/selinux/include/av_permissions.h index c2b5bb2..9500ba3 100644 --- a/security/selinux/include/av_permissions.h +++ b/security/selinux/include/av_permissions.h @@ -829,3 +829,5 @@ #define DCCP_SOCKET__NAME_CONNECT 0x0080UL #define MEMPROTECT__MMAP_ZERO 0x0001UL #define PEER__RECV0x0001UL +#define KERNEL_SERVICE__USE_AS_OVERRIDE 0x0001UL +#define KERNEL_SERVICE__CREATE_FILES_AS 0x0002UL diff --git a/security/selinux/include/class_to_string.h b/security/selinux/include/class_to_string.h index b1b0d1d..efe9efa 100644 --- a/security/selinux/include/class_to_string.h +++ b/security/selinux/include/class_to_string.h @@ -71,3 +71,4 @@ S_(NULL) S_(NULL) S_(peer) +S_(kernel_service) diff --git a/security/selinux/include/flask.h b/security/selinux/include/flask.h index 09e9dd2..2bc251a 100644 --- a/security/selinux/include/flask.h +++ b/security/selinux/include/flask.h @@ -51,6 +51,7 @@ #define SECCLASS_DCCP_SOCKET 60 #define SECCLASS_MEMPROTECT 61 #define SECCLASS_PEER68 +#define SECCLASS_KERNEL_SERVICE 69 /* * Security identifier indices for initial entities - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/27] Security: Make NFSD work with detached security [try #2]
Make NFSD work with detached security, using the patches that excise the security information from task_struct to struct task_security as a base. Each time NFSD wants a new security descriptor (to do NFS4 recovery or just to do NFS operations), a task_security record is derived from NFSD's *objective* security, modified and then applied as the *subjective* security. This means (a) the changes are not visible to anyone looking at NFSD through /proc, (b) there is no leakage between two consecutive ops with different security configurations. Consideration should probably be given to caching the task_security record on the basis that there'll probably be several ops that will want to use any particular security configuration. Furthermore, nfs4recover.c perhaps ought to set an appropriate LSM context on the record pointed to by rec_security so that the disk is accessed appropriately (see set_security_override[_from_ctx]()). NOTE! This patch must be rolled in to one of the earlier security patches to make it compile fully. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfsd/auth.c| 31 +--- fs/nfsd/nfs4recover.c | 64 +++-- 2 files changed, 62 insertions(+), 33 deletions(-) diff --git a/fs/nfsd/auth.c b/fs/nfsd/auth.c index b2e19c8..32d8e34 100644 --- a/fs/nfsd/auth.c +++ b/fs/nfsd/auth.c @@ -6,6 +6,7 @@ #include linux/types.h #include linux/sched.h +#include linux/cred.h #include linux/sunrpc/svc.h #include linux/sunrpc/svcauth.h #include linux/nfsd/nfsd.h @@ -28,11 +29,17 @@ int nfsexp_flags(struct svc_rqst *rqstp, struct svc_export *exp) int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export *exp) { + struct task_security *sec, *old; struct svc_cred cred = rqstp-rq_cred; int i; int flags = nfsexp_flags(rqstp, exp); int ret; + /* derive the new security record from nfsd's objective security */ + sec = get_kernel_security(current); + if (!sec) + return -ENOMEM; + if (flags NFSEXP_ALLSQUASH) { cred.cr_uid = exp-ex_anon_uid; cred.cr_gid = exp-ex_anon_gid; @@ -56,24 +63,30 @@ int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export *exp) get_group_info(cred.cr_group_info); if (cred.cr_uid != (uid_t) -1) - current-act_as-fsuid = cred.cr_uid; + sec-fsuid = cred.cr_uid; else - current-act_as-fsuid = exp-ex_anon_uid; + sec-fsuid = exp-ex_anon_uid; if (cred.cr_gid != (gid_t) -1) - current-act_as-fsgid = cred.cr_gid; + sec-fsgid = cred.cr_gid; else - current-act_as-fsgid = exp-ex_anon_gid; + sec-fsgid = exp-ex_anon_gid; - if (!cred.cr_group_info) + if (!cred.cr_group_info) { + put_task_security(sec); return -ENOMEM; - ret = set_groups(current-act_as, cred.cr_group_info); + } + ret = set_groups(sec, cred.cr_group_info); put_group_info(cred.cr_group_info); if ((cred.cr_uid)) { - cap_t(current-act_as-cap_effective) = ~CAP_NFSD_MASK; + cap_t(sec-cap_effective) = ~CAP_NFSD_MASK; } else { - cap_t(current-act_as-cap_effective) |= - (CAP_NFSD_MASK current-act_as-cap_permitted); + cap_t(sec-cap_effective) |= CAP_NFSD_MASK sec-cap_permitted; } + + /* set the new security as nfsd's subjective security */ + old = current-act_as; + current-act_as = sec; + put_task_security(old); return ret; } diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c index bf0217a..ae91262 100644 --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -46,27 +46,37 @@ #include linux/scatterlist.h #include linux/crypto.h #include linux/sched.h +#include linux/cred.h #define NFSDDBG_FACILITYNFSDDBG_PROC /* Globals */ static struct nameidata rec_dir; static int rec_dir_init = 0; +static struct task_security *rec_security; +/* + * switch the special recovery access security in on the current task's + * subjective security + */ static void -nfs4_save_user(uid_t *saveuid, gid_t *savegid) +nfs4_begin_secure(struct task_security **saved_sec) { - *saveuid = current-act_as-fsuid; - *savegid = current-act_as-fsgid; - current-act_as-fsuid = 0; - current-act_as-fsgid = 0; + *saved_sec = current-act_as; + current-act_as = get_task_security(rec_security); } +/* + * return the current task's subjective security to its former glory + */ static void -nfs4_reset_user(uid_t saveuid, gid_t savegid) +nfs4_end_secure(struct task_security *saved_sec) { - current-act_as-fsuid = saveuid; - current-act_as-fsgid = savegid; + struct task_security *discard; + + discard = current-act_as
[PATCH 13/27] FS-Cache: Release page-private after failed readahead [try #2]
The attached patch causes read_cache_pages() to release page-private data on a page for which add_to_page_cache() fails or the filler function fails. This permits pages with caching references associated with them to be cleaned up. The invalidatepage() address space op is called (indirectly) to do the honours. Signed-off-by: David Howells [EMAIL PROTECTED] --- mm/readahead.c | 39 +-- 1 files changed, 37 insertions(+), 2 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index c9c50ca..75aa6b6 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -44,6 +44,41 @@ EXPORT_SYMBOL_GPL(file_ra_state_init); #define list_to_page(head) (list_entry((head)-prev, struct page, lru)) +/* + * see if a page needs releasing upon read_cache_pages() failure + * - the caller of read_cache_pages() may have set PG_private before calling, + * such as the NFS fs marking pages that are cached locally on disk, thus we + * need to give the fs a chance to clean up in the event of an error + */ +static void read_cache_pages_invalidate_page(struct address_space *mapping, +struct page *page) +{ + if (PagePrivate(page)) { + if (TestSetPageLocked(page)) + BUG(); + page-mapping = mapping; + do_invalidatepage(page, 0); + page-mapping = NULL; + unlock_page(page); + } + page_cache_release(page); +} + +/* + * release a list of pages, invalidating them first if need be + */ +static void read_cache_pages_invalidate_pages(struct address_space *mapping, + struct list_head *pages) +{ + struct page *victim; + + while (!list_empty(pages)) { + victim = list_to_page(pages); + list_del(victim-lru); + read_cache_pages_invalidate_page(mapping, victim); + } +} + /** * read_cache_pages - populate an address space with some pages start reads against them * @mapping: the address_space @@ -65,14 +100,14 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages, list_del(page-lru); if (add_to_page_cache_lru(page, mapping, page-index, GFP_KERNEL)) { - page_cache_release(page); + read_cache_pages_invalidate_page(mapping, page); continue; } page_cache_release(page); ret = filler(data, page); if (unlikely(ret)) { - put_pages_list(pages); + read_cache_pages_invalidate_pages(mapping, pages); break; } task_io_account_read(PAGE_CACHE_SIZE); - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/27] FS-Cache: Recruit a couple of page flags for cache management [try #2]
Recruit a couple of page flags to aid in cache management. The following extra flags are defined: (1) PG_fscache (PG_private_2) The marked page is backed by a local cache and is pinning resources in the cache driver. (2) PG_fscache_write (PG_owner_priv_2) The marked page is being written to the local cache. The page may not be modified whilst this is in progress. If PG_fscache is set, then things that checked for PG_private will now also check for that. This includes things like truncation and page invalidation. The function page_has_private() had been added to make the checks for both PG_private and PG_private_2 at the same time. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/splice.c|2 +- include/linux/page-flags.h | 39 +-- include/linux/pagemap.h| 11 +++ mm/filemap.c | 18 ++ mm/migrate.c |2 +- mm/page_alloc.c|3 +++ mm/readahead.c |9 + mm/swap.c |4 ++-- mm/swap_state.c|4 ++-- mm/truncate.c | 10 +- mm/vmscan.c|2 +- 11 files changed, 86 insertions(+), 18 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 6bdcb61..61edad7 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -58,7 +58,7 @@ static int page_cache_pipe_buf_steal(struct pipe_inode_info *pipe, */ wait_on_page_writeback(page); - if (PagePrivate(page)) + if (page_has_private(page)) try_to_release_page(page, GFP_KERNEL); /* diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 209d3a4..f375e3b 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -77,25 +77,32 @@ #define PG_active 6 #define PG_slab 7 /* slab debug (Suparna wants this) */ -#define PG_owner_priv_1 8 /* Owner use. If pagecache, fs may use*/ +#define PG_owner_priv_1 8 /* Owner use. fs may use in pagecache */ #define PG_arch_1 9 #define PG_reserved10 #define PG_private 11 /* If pagecache, has fs-private data */ #define PG_writeback 12 /* Page is under writeback */ +#define PG_private_2 13 /* If pagecache, has fs aux data */ #define PG_compound14 /* Part of a compound page */ #define PG_swapcache 15 /* Swap page: swp_entry_t in private */ #define PG_mappedtodisk16 /* Has blocks allocated on-disk */ #define PG_reclaim 17 /* To be reclaimed asap */ +#define PG_owner_priv_218 /* Owner use. fs may use in pagecache */ #define PG_buddy 19 /* Page is free, on buddy lists */ /* PG_readahead is only used for file reads; PG_reclaim is only for writes */ #define PG_readahead PG_reclaim /* Reminder to do async read-ahead */ -/* PG_owner_priv_1 users should have descriptive aliases */ +/* PG_owner_priv_1/2 users should have descriptive aliases */ #define PG_checked PG_owner_priv_1 /* Used by some filesystems */ #define PG_pinned PG_owner_priv_1 /* Xen pinned pagetable */ +#define PG_fscache_write PG_owner_priv_2 /* Writing to local cache */ + +/* PG_private_2 causes releasepage() and co to be invoked */ +#define PG_fscache PG_private_2/* Backed by local cache */ + #if (BITS_PER_LONG 32) /* @@ -199,6 +206,23 @@ static inline void SetPageUptodate(struct page *page) #define TestClearPageWriteback(page) test_and_clear_bit(PG_writeback, \ (page)-flags) +#define PagePrivate2(page) test_bit(PG_private_2, (page)-flags) +#define SetPagePrivate2(page) set_bit(PG_private_2, (page)-flags) +#define ClearPagePrivate2(page)clear_bit(PG_private_2, (page)-flags) +#define TestSetPagePrivate2(page) test_and_set_bit(PG_private_2, (page)-flags) +#define TestClearPagePrivate2(page) test_and_clear_bit(PG_private_2, \ + (page)-flags) + +#define PageOwnerPriv2(page) test_bit(PG_owner_priv_2, \ +(page)-flags) +#define SetPageOwnerPriv2(page)set_bit(PG_owner_priv_2, (page)-flags) +#define ClearPageOwnerPriv2(page) clear_bit(PG_owner_priv_2, \ + (page)-flags) +#define TestSetPageOwnerPriv2(page)test_and_set_bit(PG_owner_priv_2, \ +(page)-flags) +#define TestClearPageOwnerPriv2(page) test_and_clear_bit(PG_owner_priv_2, \ + (page)-flags) + #define PageBuddy(page
[PATCH 15/27] FS-Cache: Provide an add_wait_queue_tail() function [try #2]
Provide an add_wait_queue_tail() function to add a waiter to the back of a wait queue instead of the front. Signed-off-by: David Howells [EMAIL PROTECTED] --- include/linux/pagemap.h |7 +-- include/linux/wait.h|2 ++ kernel/wait.c | 18 ++ mm/filemap.c|2 +- 4 files changed, 26 insertions(+), 3 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 1ab7f9a..00b108c 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -211,8 +211,11 @@ static inline void wait_on_page_writeback(struct page *page) extern void end_page_writeback(struct page *page); -/* - * Wait for a PG_owner_priv_2 to become clear +/** + * wait_on_page_owner_priv_2 - Wait for PG_owner_priv_2 to become clear + * @page: The page to monitor + * + * Wait for a PG_owner_priv_2 to become clear on the specified page. */ static inline void wait_on_page_owner_priv_2(struct page *page) { diff --git a/include/linux/wait.h b/include/linux/wait.h index 0e68628..f1038d0 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -118,6 +118,8 @@ static inline int waitqueue_active(wait_queue_head_t *q) #define is_sync_wait(wait) (!(wait) || ((wait)-private)) extern void FASTCALL(add_wait_queue(wait_queue_head_t *q, wait_queue_t * wait)); +extern void FASTCALL(add_wait_queue_tail(wait_queue_head_t *q, +wait_queue_t *wait)); extern void FASTCALL(add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t * wait)); extern void FASTCALL(remove_wait_queue(wait_queue_head_t *q, wait_queue_t * wait)); diff --git a/kernel/wait.c b/kernel/wait.c index 444ddbf..7acc9cc 100644 --- a/kernel/wait.c +++ b/kernel/wait.c @@ -29,6 +29,24 @@ void fastcall add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait) } EXPORT_SYMBOL(add_wait_queue); +/** + * add_wait_queue_tail - Add a waiter to the back of a waitqueue + * @q: the wait queue to append the waiter to + * @wait: the waiter to be queued + * + * Add a waiter to the back of a waitqueue so that it gets woken up last. + */ +void fastcall add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t *wait) +{ + unsigned long flags; + + wait-flags = ~WQ_FLAG_EXCLUSIVE; + spin_lock_irqsave(q-lock, flags); + __add_wait_queue_tail(q, wait); + spin_unlock_irqrestore(q-lock, flags); +} +EXPORT_SYMBOL(add_wait_queue_tail); + void fastcall add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait) { unsigned long flags; diff --git a/mm/filemap.c b/mm/filemap.c index 5551410..90ccb10 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -572,7 +572,7 @@ void end_page_writeback(struct page *page) EXPORT_SYMBOL(end_page_writeback); /** - * end_page_own - Clear PG_owner_priv_2 and wake up any waiters + * end_page_owner_priv_2 - Clear PG_owner_priv_2 and wake up any waiters * @page: the page * * Clear PG_owner_priv_2 and wake up any processes waiting for that event. - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/27] CacheFiles: Be consistent about the use of mapping vs file-f_mapping in Ext3 [try #2]
Change all the usages of file-f_mapping in ext3_*write_end() functions to use the mapping argument directly. This has two consequences: (*) Consistency. Without this patch sometimes one is used and sometimes the other is. (*) A NULL file pointer can be passed. This feature is then made use of by the generic hook in the next patch, which is used by CacheFiles to write pages to a file without setting up a file struct. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/ext3/inode.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index 9b162cd..bc918d3 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -1227,7 +1227,7 @@ static int ext3_generic_write_end(struct file *file, loff_t pos, unsigned len, unsigned copied, struct page *page, void *fsdata) { - struct inode *inode = file-f_mapping-host; + struct inode *inode = mapping-host; copied = block_write_end(file, mapping, pos, len, copied, page, fsdata); @@ -1252,7 +1252,7 @@ static int ext3_ordered_write_end(struct file *file, struct page *page, void *fsdata) { handle_t *handle = ext3_journal_current_handle(); - struct inode *inode = file-f_mapping-host; + struct inode *inode = mapping-host; unsigned from, to; int ret = 0, ret2; @@ -1293,7 +1293,7 @@ static int ext3_writeback_write_end(struct file *file, struct page *page, void *fsdata) { handle_t *handle = ext3_journal_current_handle(); - struct inode *inode = file-f_mapping-host; + struct inode *inode = mapping-host; int ret = 0, ret2; loff_t new_i_size; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/27] CacheFiles: Add missing copy_page export for ia64 [try #2]
This one-line patch fixes the missing export of copy_page introduced by the cachefile patches. This patch is not yet upstream, but is required for cachefile on ia64. It will be pushed upstream when cachefile goes upstream. Signed-off-by: Prarit Bhargava [EMAIL PROTECTED] Signed-off-by: David Howells [EMAIL PROTECTED] --- arch/ia64/kernel/ia64_ksyms.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kernel/ia64_ksyms.c b/arch/ia64/kernel/ia64_ksyms.c index c3b4412..e64fd61 100644 --- a/arch/ia64/kernel/ia64_ksyms.c +++ b/arch/ia64/kernel/ia64_ksyms.c @@ -43,6 +43,7 @@ EXPORT_SYMBOL(__do_clear_user); EXPORT_SYMBOL(__strlen_user); EXPORT_SYMBOL(__strncpy_from_user); EXPORT_SYMBOL(__strnlen_user); +EXPORT_SYMBOL(copy_page); /* from arch/ia64/lib */ extern void __divsi3(void); - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 19/27] CacheFiles: Add a hook to write a single page of data to an inode [try #2]
Add an address space operation to write one single page of data to an inode at a page-aligned location (thus permitting the implementation to be highly optimised). The data source is a single page. This is used by CacheFiles to store the contents of netfs pages into their backing file pages. Supply a generic implementation for this that uses the write_begin() and write_end() address_space operations to bind a copy directly into the page cache. Hook the Ext2 and Ext3 operations to the generic implementation. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/ext2/inode.c|2 ++ fs/ext3/inode.c|3 +++ include/linux/fs.h |7 ++ mm/filemap.c | 61 4 files changed, 73 insertions(+), 0 deletions(-) diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index b1ab32a..cfa56e6 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -796,6 +796,7 @@ const struct address_space_operations ext2_aops = { .direct_IO = ext2_direct_IO, .writepages = ext2_writepages, .migratepage= buffer_migrate_page, + .write_one_page = generic_file_buffered_write_one_page, }; const struct address_space_operations ext2_aops_xip = { @@ -814,6 +815,7 @@ const struct address_space_operations ext2_nobh_aops = { .direct_IO = ext2_direct_IO, .writepages = ext2_writepages, .migratepage= buffer_migrate_page, + .write_one_page = generic_file_buffered_write_one_page, }; /* diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index bc918d3..435c684 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -1780,6 +1780,7 @@ static const struct address_space_operations ext3_ordered_aops = { .releasepage= ext3_releasepage, .direct_IO = ext3_direct_IO, .migratepage= buffer_migrate_page, + .write_one_page = generic_file_buffered_write_one_page, }; static const struct address_space_operations ext3_writeback_aops = { @@ -1794,6 +1795,7 @@ static const struct address_space_operations ext3_writeback_aops = { .releasepage= ext3_releasepage, .direct_IO = ext3_direct_IO, .migratepage= buffer_migrate_page, + .write_one_page = generic_file_buffered_write_one_page, }; static const struct address_space_operations ext3_journalled_aops = { @@ -1807,6 +1809,7 @@ static const struct address_space_operations ext3_journalled_aops = { .bmap = ext3_bmap, .invalidatepage = ext3_invalidatepage, .releasepage= ext3_releasepage, + .write_one_page = generic_file_buffered_write_one_page, }; void ext3_set_aops(struct inode *inode) diff --git a/include/linux/fs.h b/include/linux/fs.h index 850d3fc..a3c3369 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -479,6 +479,11 @@ struct address_space_operations { int (*migratepage) (struct address_space *, struct page *, struct page *); int (*launder_page) (struct page *); + /* write the contents of the source page over the page at the specified +* index in the target address space (the source page does not need to +* be related to the target address space) */ + int (*write_one_page)(struct address_space *, pgoff_t, struct page *); + }; /* @@ -1801,6 +1806,8 @@ extern ssize_t generic_file_direct_write(struct kiocb *, const struct iovec *, unsigned long *, loff_t, loff_t *, size_t, size_t); extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec *, unsigned long, loff_t, loff_t *, size_t, ssize_t); +extern int generic_file_buffered_write_one_page(struct address_space *, + pgoff_t, struct page *); extern ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos); extern ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos); extern void do_generic_mapping_read(struct address_space *mapping, diff --git a/mm/filemap.c b/mm/filemap.c index ba669a8..adfba8a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2337,6 +2337,67 @@ generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov, } EXPORT_SYMBOL(generic_file_buffered_write); +/** + * generic_file_buffered_write_one_page - Write a single page of data to an + * inode + * @mapping - The address space of the target inode + * @index - The target page in the target inode to fill + * @source - The data to write into the target page + * + * Write the data from the source page to the page in the nominated address + * space at the @index specified. Note that the file will not be extended if + * the page crosses the EOF marker, in which case only the first part of the + * page will be written. + * + * The @source page does not need to have any association
[PATCH 20/27] CacheFiles: Permit the page lock state to be monitored [try #2]
Add a function to install a monitor on the page lock waitqueue for a particular page, thus allowing the page being unlocked to be detected. This is used by CacheFiles to detect read completion on a page in the backing filesystem so that it can then copy the data to the waiting netfs page. Signed-off-by: David Howells [EMAIL PROTECTED] --- include/linux/pagemap.h |5 + mm/filemap.c| 18 ++ 2 files changed, 23 insertions(+), 0 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index d534689..963b2a4 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -228,6 +228,11 @@ static inline void wait_on_page_owner_priv_2(struct page *page) extern void end_page_owner_priv_2(struct page *page); /* + * Add an arbitrary waiter to a page's wait queue + */ +extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter); + +/* * Fault a userspace page into pagetables. Return non-zero on a fault. * * This assumes that two userspace pages are always sufficient. That's diff --git a/mm/filemap.c b/mm/filemap.c index adfba8a..8f7fe10 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -533,6 +533,24 @@ void fastcall wait_on_page_bit(struct page *page, int bit_nr) EXPORT_SYMBOL(wait_on_page_bit); /** + * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue + * @page - Page defining the wait queue of interest + * @waiter - Waiter to add to the queue + * + * Add an arbitrary @waiter to the wait queue for the nominated @page. + */ +void add_page_wait_queue(struct page *page, wait_queue_t *waiter) +{ + wait_queue_head_t *q = page_waitqueue(page); + unsigned long flags; + + spin_lock_irqsave(q-lock, flags); + __add_wait_queue(q, waiter); + spin_unlock_irqrestore(q-lock, flags); +} +EXPORT_SYMBOL_GPL(add_page_wait_queue); + +/** * unlock_page - unlock a locked page * @page: the page * - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 23/27] NFS: Fix memory leak [try #2]
Fix a memory leak whereby multiple clientaddr=xxx mount options just overwrite the duplicated client_address option pointer, without freeing the old memory. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/super.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 0b0c72a..7f5e747 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -936,6 +936,7 @@ static int nfs_parse_mount_options(char *raw, string = match_strdup(args); if (string == NULL) goto out_nomem; + kfree(mnt-client_address); mnt-client_address = string; break; case Opt_mountaddr: - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 21/27] CacheFiles: Export things for CacheFiles [try #2]
Export a number of functions for CacheFiles's use. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/super.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/super.c b/fs/super.c index ceaf2e3..cd199ae 100644 --- a/fs/super.c +++ b/fs/super.c @@ -266,6 +266,7 @@ int fsync_super(struct super_block *sb) __fsync_super(sb); return sync_blockdev(sb-s_bdev); } +EXPORT_SYMBOL_GPL(fsync_super); /** * generic_shutdown_super - common helper for -kill_sb() - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 25/27] NFS: Configuration and mount option changes to enable local caching on NFS [try #2]
Changes to the kernel configuration defintions and to the NFS mount options to allow the local caching support added by the previous patch to be enabled. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/Kconfig|8 fs/nfs/client.c |2 ++ fs/nfs/internal.h |1 + fs/nfs/super.c| 14 ++ 4 files changed, 25 insertions(+), 0 deletions(-) diff --git a/fs/Kconfig b/fs/Kconfig index e95b11c..39b1981 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -1650,6 +1650,14 @@ config NFS_V4 If unsure, say N. +config NFS_FSCACHE + bool Provide NFS client caching support (EXPERIMENTAL) + depends on EXPERIMENTAL + depends on NFS_FS=m FSCACHE || NFS_FS=y FSCACHE=y + help + Say Y here if you want NFS data to be cached locally on disc through + the general filesystem cache manager + config NFS_DIRECTIO bool Allow direct I/O on NFS files depends on NFS_FS diff --git a/fs/nfs/client.c b/fs/nfs/client.c index bcdc5d0..92f9b84 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -572,6 +572,7 @@ static int nfs_init_server(struct nfs_server *server, /* Initialise the client representation from the mount data */ server-flags = data-flags NFS_MOUNT_FLAGMASK; + server-options = data-options; if (data-rsize) server-rsize = nfs_block_size(data-rsize, NULL); @@ -931,6 +932,7 @@ static int nfs4_init_server(struct nfs_server *server, /* Initialise the client representation from the mount data */ server-flags = data-flags NFS_MOUNT_FLAGMASK; server-caps |= NFS_CAP_ATOMIC_OPEN; + server-options = data-options; if (data-rsize) server-rsize = nfs_block_size(data-rsize, NULL); diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index f3acf48..ef09e00 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -35,6 +35,7 @@ struct nfs_parsed_mount_data { int acregmin, acregmax, acdirmin, acdirmax; int namlen; + unsigned intoptions; unsigned intbsize; unsigned intauth_flavor_len; rpc_authflavor_tauth_flavors[1]; diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 6dd628f..0542550 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -74,6 +74,7 @@ enum { Opt_acl, Opt_noacl, Opt_rdirplus, Opt_nordirplus, Opt_sharecache, Opt_nosharecache, + Opt_fscache, Opt_nofscache, /* Mount options that take integer arguments */ Opt_port, @@ -123,6 +124,8 @@ static match_table_t nfs_mount_option_tokens = { { Opt_nordirplus, nordirplus }, { Opt_sharecache, sharecache }, { Opt_nosharecache, nosharecache }, + { Opt_fscache, fsc }, + { Opt_nofscache, nofsc }, { Opt_port, port=%u }, { Opt_rsize, rsize=%u }, @@ -459,6 +462,8 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss, seq_printf(m, ,timeo=%lu, 10U * clp-retrans_timeo / HZ); seq_printf(m, ,retrans=%u, clp-retrans_count); seq_printf(m, ,sec=%s, nfs_pseudoflavour_to_name(nfss-client-cl_auth-au_flavor)); + if (nfss-options NFS_OPTION_FSCACHE) + seq_printf(m, ,fsc); } /* @@ -697,6 +702,15 @@ static int nfs_parse_mount_options(char *raw, break; case Opt_nosharecache: mnt-flags |= NFS_MOUNT_UNSHARED; + mnt-options = ~NFS_OPTION_FSCACHE; + break; + case Opt_fscache: + /* sharing is mandatory with fscache */ + mnt-options |= NFS_OPTION_FSCACHE; + mnt-flags = ~NFS_MOUNT_UNSHARED; + break; + case Opt_nofscache: + mnt-options = ~NFS_OPTION_FSCACHE; break; case Opt_port: - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 26/27] NFS: Display local caching state [try #2]
Display the local caching state in /proc/fs/nfsfs/volumes. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/client.c |7 --- fs/nfs/fscache.h | 15 +++ 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index 92f9b84..68d3124 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -1335,7 +1335,7 @@ static int nfs_volume_list_show(struct seq_file *m, void *v) /* display header on line 1 */ if (v == nfs_volume_list) { - seq_puts(m, NV SERVER PORT DEV FSID\n); + seq_puts(m, NV SERVER PORT DEV FSID FSC\n); return 0; } /* display one transport per line on subsequent lines */ @@ -1349,12 +1349,13 @@ static int nfs_volume_list_show(struct seq_file *m, void *v) (unsigned long long) server-fsid.major, (unsigned long long) server-fsid.minor); - seq_printf(m, v%d %02x%02x%02x%02x %4hx %-7s %-17s\n, + seq_printf(m, v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n, clp-cl_nfsversion, NIPQUAD(clp-cl_addr.sin_addr), ntohs(clp-cl_addr.sin_port), dev, - fsid); + fsid, + nfs_server_fscache_state(server)); return 0; } diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h index 144fb58..9a735fc 100644 --- a/fs/nfs/fscache.h +++ b/fs/nfs/fscache.h @@ -53,6 +53,17 @@ extern void __nfs_fscache_invalidate_page(struct page *, struct inode *); extern int nfs_fscache_release_page(struct page *, gfp_t); /* + * indicate the client caching state as readable text + */ +static inline const char *nfs_server_fscache_state(struct nfs_server *server) +{ + if (server-nfs_client-fscache + (server-options NFS_OPTION_FSCACHE)) + return yes; + return no ; +} + +/* * release the caching state associated with a page if undergoing complete page * invalidation */ @@ -109,6 +120,10 @@ static inline void nfs4_fscache_get_client_cookie(struct nfs_client *clp) {} static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp) {} static inline void nfs_fscache_show_stats(struct seq_file *m, struct nfs_server *nfss) {} +static inline const char *nfs_server_fscache_state(struct nfs_server *server) +{ + return no ; +} static inline void nfs_fscache_init_fh_cookie(struct inode *inode) {} static inline void nfs_fscache_enable_fh_cookie(struct inode *inode) {} - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 24/27] NFS: Use local caching [try #2]
The attached patch makes it possible for the NFS filesystem to make use of the network filesystem local caching service (FS-Cache). To be able to use this, an updated mount program is required. This can be obtained from: http://people.redhat.com/steved/fscache/util-linux/ To mount an NFS filesystem to use caching, add an fsc option to the mount: mount warthog:/ /a -o fsc Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/Makefile |1 fs/nfs/client.c |5 + fs/nfs/file.c | 37 fs/nfs/fscache-def.c | 289 + fs/nfs/fscache.c | 391 + fs/nfs/fscache.h | 148 + fs/nfs/inode.c| 47 + fs/nfs/read.c | 28 +++ fs/nfs/super.c|3 fs/nfs/sysctl.c |1 include/linux/nfs_fs.h|9 + include/linux/nfs_fs_sb.h | 18 ++ 12 files changed, 968 insertions(+), 9 deletions(-) create mode 100644 fs/nfs/fscache-def.c create mode 100644 fs/nfs/fscache.c create mode 100644 fs/nfs/fscache.h diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile index df0f41e..073d04c 100644 --- a/fs/nfs/Makefile +++ b/fs/nfs/Makefile @@ -16,3 +16,4 @@ nfs-$(CONFIG_NFS_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o \ nfs4namespace.o nfs-$(CONFIG_NFS_DIRECTIO) += direct.o nfs-$(CONFIG_SYSCTL) += sysctl.o +nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-def.o diff --git a/fs/nfs/client.c b/fs/nfs/client.c index a6f6254..bcdc5d0 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -43,6 +43,7 @@ #include delegation.h #include iostat.h #include internal.h +#include fscache.h #define NFSDBG_FACILITYNFSDBG_CLIENT @@ -139,6 +140,8 @@ static struct nfs_client *nfs_alloc_client(const char *hostname, clp-cl_state = 1 NFS4CLNT_LEASE_EXPIRED; #endif + nfs_fscache_get_client_cookie(clp); + return clp; error_3: @@ -170,6 +173,8 @@ static void nfs_free_client(struct nfs_client *clp) nfs4_shutdown_client(clp); + nfs_fscache_release_client_cookie(clp); + /* -EIO all pending I/O */ if (!IS_ERR(clp-cl_rpcclient)) rpc_shutdown_client(clp-cl_rpcclient); diff --git a/fs/nfs/file.c b/fs/nfs/file.c index b3bb89f..d492cd7 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -35,6 +35,7 @@ #include delegation.h #include internal.h #include iostat.h +#include fscache.h #define NFSDBG_FACILITYNFSDBG_FILE @@ -352,22 +353,48 @@ static int nfs_write_end(struct file *file, struct address_space *mapping, return status 0 ? status : copied; } +/* + * Partially or wholly invalidate a page + * - Release the private state associated with a page if undergoing complete + * page invalidation + * - Called if either PG_private or PG_fscache set on the page + * - Caller holds page lock + */ static void nfs_invalidate_page(struct page *page, unsigned long offset) { if (offset != 0) return; /* Cancel any unstarted writes on this page */ nfs_wb_page_cancel(page-mapping-host, page); + + nfs_fscache_invalidate_page(page, page-mapping-host); } +/* + * Release the private state associated with a page + * - Called if either PG_private or PG_fscache set on the page + * - Caller holds page lock + * - Return true (may release) or false (may not) + */ static int nfs_release_page(struct page *page, gfp_t gfp) { /* If PagePrivate() is set, then the page is not freeable */ - return 0; + if (PagePrivate(page)) + return 0; + return nfs_fscache_release_page(page, gfp); } +/* + * Attempt to clear the private state associated with a page when an error + * occurs that requires the cached contents of an inode to be written back or + * destroyed + * - Called if either PG_private or PG_fscache set on the page + * - Caller holds page lock + * - Return 0 if successful, -error otherwise + */ static int nfs_launder_page(struct page *page) { + wait_on_page_fscache_write(page); return nfs_wb_page(page-mapping-host, page); } @@ -387,6 +414,11 @@ const struct address_space_operations nfs_file_aops = { .launder_page = nfs_launder_page, }; +/* + * Notification that a PTE pointing to an NFS page is about to be made + * writable, implying that someone is about to modify the page through a + * shared-writable mapping + */ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct page *page) { struct file *filp = vma-vm_file; @@ -396,6 +428,9 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct page *page) struct address_space *mapping; loff_t offset; + /* make sure the cache has finished storing the page */ + wait_on_page_fscache_write(page); + lock_page(page); mapping = page-mapping
[PATCH 27/27] NFS: Separate caching by superblock, explicitly if necessary [try #2]
Separate caching by superblock, explicitly if necessary. This means mounts of the same remote data with different parameters do not share cache objects for common files. The administrator may also provide a uniquifier to further enhance the uniqueness. Where it is otherwise impossible to distinguish superblocks because all the parameters are identical, but the 'nosharecache' option is supplied, a uniquifying string must be supplied, else only the first mount will be permitted to use the cache. If there's a key collision, then the second mount will disable caching and give a warning into the kernel log. There are three variant NFS mount options that can be added to a mount command to control caching for a mount. Only the last one specified takes effect: (*) Adding fsc will request caching. (*) Adding fsc=string will request caching and also specify a uniquifier. (*) Adding nofsc will disable caching. Signed-off-by: David Howells [EMAIL PROTECTED] --- fs/nfs/fscache-def.c | 33 fs/nfs/fscache.c | 122 - fs/nfs/fscache.h | 46 - fs/nfs/internal.h |3 + fs/nfs/super.c| 24 +++-- include/linux/nfs_fs_sb.h |3 + 6 files changed, 220 insertions(+), 11 deletions(-) diff --git a/fs/nfs/fscache-def.c b/fs/nfs/fscache-def.c index bc20b7d..1d10b4e 100644 --- a/fs/nfs/fscache-def.c +++ b/fs/nfs/fscache-def.c @@ -117,6 +117,39 @@ const struct fscache_cookie_def nfs_cache_server_index_def = { }; /* + * Generate a key to describe a superblock key in the main NFS index + */ +static uint16_t nfs_super_get_key(const void *cookie_netfs_data, + void *buffer, uint16_t bufmax) +{ + const struct nfs_fscache_key *key; + const struct nfs_server *nfss = cookie_netfs_data; + uint16_t len; + + key = nfss-fscache_key; + len = sizeof(key-key) + key-key.uniq_len; + if (len bufmax) { + len = 0; + } else { + memcpy(buffer, key-key, sizeof(key-key)); + memcpy(buffer + sizeof(key-key), + key-key.uniquifier, key-key.uniq_len); + } + + return len; +} + +/* + * The superblock index for the filesystem is defined by all the NFS parameters + * that might cause a separate superblock + */ +const struct fscache_cookie_def nfs_cache_super_index_def = { + .name = NFS.supers, + .type = FSCACHE_COOKIE_TYPE_INDEX, + .get_key= nfs_super_get_key, +}; + +/* * Generate a key to describe an NFS inode in an NFS server's index */ static uint16_t nfs_fh_get_key(const void *cookie_netfs_data, diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c index 465f961..af9c65c 100644 --- a/fs/nfs/fscache.c +++ b/fs/nfs/fscache.c @@ -23,6 +23,9 @@ #define NFSDBG_FACILITYNFSDBG_FSCACHE +static struct rb_root nfs_fscache_keys = RB_ROOT; +static DEFINE_SPINLOCK(nfs_fscache_keys_lock); + /* * Get the per-client index cookie for an NFS client if the appropriate mount * flag was set @@ -52,6 +55,118 @@ void nfs_fscache_release_client_cookie(struct nfs_client *clp) } /* + * get a cookie for a superblock + */ +void nfs_fscache_get_super_cookie(struct super_block *sb, + struct nfs_parsed_mount_data *data) +{ + struct nfs_fscache_key *key, *xkey; + struct nfs_server *nfss = NFS_SB(sb); + struct rb_node **p, *parent; + const char *uniq = data-fscache_uniq ?: ; + int diff, ulen; + + ulen = strlen(uniq); + key = kzalloc(sizeof(*key) + ulen, GFP_KERNEL); + if (!key) + return; + + key-nfs_client = nfss-nfs_client; + key-key.super.s_flags = sb-s_flags NFS_MS_MASK; + key-key.nfs_server.flags = nfss-flags; + key-key.nfs_server.rsize = nfss-rsize; + key-key.nfs_server.wsize = nfss-wsize; + key-key.nfs_server.acregmin = nfss-acregmin; + key-key.nfs_server.acregmax = nfss-acregmax; + key-key.nfs_server.acdirmin = nfss-acdirmin; + key-key.nfs_server.acdirmax = nfss-acdirmax; + key-key.nfs_server.fsid = nfss-fsid; + key-key.rpc_auth.au_flavor = nfss-client-cl_auth-au_flavor; + + key-key.uniq_len = ulen; + memcpy(key-key.uniquifier, uniq, ulen); + + spin_lock(nfs_fscache_keys_lock); + p = nfs_fscache_keys.rb_node; + parent = NULL; + while (*p) { + parent = *p; + xkey = rb_entry(parent, struct nfs_fscache_key, node); + + if (key-nfs_client xkey-nfs_client) + goto go_left; + if (key-nfs_client xkey-nfs_client) + goto go_right; + + diff = memcmp(key-key, xkey-key, sizeof(key-key)); + if (diff 0) + goto go_left; + if (diff 0) + goto go_right
Re: [PATCH] procfs: constify function pointer tables
FRV looks okay. Acked-By: David Howells [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/27] Permit filesystem local caching
These patches add local caching for network filesystems such as NFS. The patches can roughly be broken down into a number of sets: (*) 01-keys-inc-payload.diff (*) 02-keys-search-keyring.diff (*) 03-keys-callout-blob.diff Three patches to the keyring code made to help the CIFS people. Included because of patches 05-08. (*) 04-keys-get-label.diff A patch to allow the security label of a key to be retrieved. Included because of patches 05-08. (*) 05-security-current-fsugid.diff (*) 06-security-separate-task-bits.diff (*) 07-security-subjective.diff (*) 08-security-secctx2secid.diff (*) 09-security-additional-classes.diff (*) 10-security-kernel_service-class.diff (*) 11-security-kernel-service.diff (*) 12-security-nfsd.diff Patches to permit the subjective security of a task to be overridden. All the security details in task_struct are decanted into a new struct that task_struct then has two pointers two: one that defines the objective security of that task (how other tasks may affect it) and one that defines the subjective security (how it may affect other objects). Note that I have dropped the idea of struct cred for the moment. With the amount of stuff that was excluded from it, it wasn't actually any use to me. However, it can be added later. Required for cachefiles. (*) 13-release-page.diff (*) 14-fscache-page-flags.diff (*) 15-add_wait_queue_tail.diff (*) 16-fscache.diff Patches to provide a local caching facility for network filesystems. (*) 17-cachefiles-ia64.diff (*) 18-cachefiles-ext3-f_mapping.diff (*) 19-cachefiles-write.diff (*) 20-cachefiles-monitor.diff (*) 21-cachefiles-export.diff (*) 22-cachefiles.diff Patches to provide a local cache in a directory of an already mounted filesystem. (*) 23-nfs-memleak.diff (*) 24-fscache-nfs.diff (*) 25-fscache-nfs-mount.diff (*) 26-fscache-nfs-display.diff (*) 27-fscache-nfs-persb.diff Patches to provide NFS with local caching. The fifth of these patches makes caching configurable per superblock. I've updated the patches to compile on as many arches I can get compilers for and can get to compile. However, for patch 06, the sparc and alpha arches need some asm work as they access security information from asm code, using asm-offsets to calculate the offset. The SELinux base code will also need updating to have the security class, lest the following error appear in dmesg: context_struct_compute_av: unrecognized class 69 I've provided a patch to make NFSd use task_security and current-act_as to change its security settings. I've also renamed the accessors for the PG_fscache and PG_fscache_write bits in page-flags.h, pagemap.h and filemap.c (they subclass PG_private_2 and PG_owner_priv_2 so these are the accessors in the main headers). I've then wrapped them in fscache.h. -- A tarball of the patches is available at: http://people.redhat.com/~dhowells/fscache/patches/nfs+fscache-27.tar.bz2 To use this version of CacheFiles, the cachefilesd-0.9 is also required. It is available as an SRPM: http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9-1.fc7.src.rpm Or as individual bits: http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9.tar.bz2 http://people.redhat.com/~dhowells/fscache/cachefilesd.fc http://people.redhat.com/~dhowells/fscache/cachefilesd.if http://people.redhat.com/~dhowells/fscache/cachefilesd.te http://people.redhat.com/~dhowells/fscache/cachefilesd.spec The .fc, .if and .te files are for manipulating SELinux. David - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/27] KEYS: Increase the payload size when instantiating a key
Increase the size of a payload that can be used to instantiate a key in add_key() and keyctl_instantiate_key(). This permits huge CIFS SPNEGO blobs to be passed around. The limit is raised to 1MB. If kmalloc() can't allocate a buffer of sufficient size, vmalloc() will be tried instead. Signed-off-by: David Howells [EMAIL PROTECTED] --- security/keys/keyctl.c | 38 ++ 1 files changed, 30 insertions(+), 8 deletions(-) diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index d9ca15c..8ec8432 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -19,6 +19,7 @@ #include linux/capability.h #include linux/string.h #include linux/err.h +#include linux/vmalloc.h #include asm/uaccess.h #include internal.h @@ -62,9 +63,10 @@ asmlinkage long sys_add_key(const char __user *_type, char type[32], *description; void *payload; long ret; + bool vm; ret = -EINVAL; - if (plen 32767) + if (plen 1024 * 1024 - 1) goto error; /* draw all the data into kernel space */ @@ -81,11 +83,18 @@ asmlinkage long sys_add_key(const char __user *_type, /* pull the payload in if one was supplied */ payload = NULL; + vm = false; if (_payload) { ret = -ENOMEM; payload = kmalloc(plen, GFP_KERNEL); - if (!payload) - goto error2; + if (!payload) { + if (plen = PAGE_SIZE) + goto error2; + vm = true; + payload = vmalloc(plen); + if (!payload) + goto error2; + } ret = -EFAULT; if (copy_from_user(payload, _payload, plen) != 0) @@ -113,7 +122,10 @@ asmlinkage long sys_add_key(const char __user *_type, key_ref_put(keyring_ref); error3: - kfree(payload); + if (!vm) + kfree(payload); + else + vfree(payload); error2: kfree(description); error: @@ -821,9 +833,10 @@ long keyctl_instantiate_key(key_serial_t id, key_ref_t keyring_ref; void *payload; long ret; + bool vm = false; ret = -EINVAL; - if (plen 32767) + if (plen 1024 * 1024 - 1) goto error; /* the appropriate instantiation authorisation key must have been @@ -843,8 +856,14 @@ long keyctl_instantiate_key(key_serial_t id, if (_payload) { ret = -ENOMEM; payload = kmalloc(plen, GFP_KERNEL); - if (!payload) - goto error; + if (!payload) { + if (plen = PAGE_SIZE) + goto error; + vm = true; + payload = vmalloc(plen); + if (!payload) + goto error; + } ret = -EFAULT; if (copy_from_user(payload, _payload, plen) != 0) @@ -877,7 +896,10 @@ long keyctl_instantiate_key(key_serial_t id, } error2: - kfree(payload); + if (!vm) + kfree(payload); + else + vfree(payload); error: return ret; - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/27] KEYS: Check starting keyring as part of search
Check the starting keyring as part of the search to (a) see if that is what we're searching for, and (b) to check it is still valid for searching. The scenario: User in process A does things that cause things to be created in its process session keyring. The user then does an su to another user and starts a new process, B. The two processes now share the same process session keyring. Process B does an NFS access which results in an upcall to gssd. When gssd attempts to instantiate the context key (to be linked into the process session keyring), it is denied access even though it has an authorization key. The order of calls is: keyctl_instantiate_key() lookup_user_key() (the default: case) search_process_keyrings(current) search_process_keyrings(rka-context) (recursive call) keyring_search_aux() keyring_search_aux() verifies the keys and keyrings underneath the top-level keyring it is given, but that top-level keyring is neither fully validated nor checked to see if it is the thing being searched for. This patch changes keyring_search_aux() to: 1) do more validation on the top keyring it is given and 2) check whether that top-level keyring is the thing being searched for Signed-off-by: Kevin Coffman [EMAIL PROTECTED] Signed-off-by: David Howells [EMAIL PROTECTED] --- security/keys/keyring.c | 35 +++ 1 files changed, 31 insertions(+), 4 deletions(-) diff --git a/security/keys/keyring.c b/security/keys/keyring.c index 88292e3..76b89b2 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -292,7 +292,7 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref, struct keyring_list *keylist; struct timespec now; - unsigned long possessed; + unsigned long possessed, kflags; struct key *keyring, *key; key_ref_t key_ref; long err; @@ -318,6 +318,32 @@ key_ref_t keyring_search_aux(key_ref_t keyring_ref, now = current_kernel_time(); err = -EAGAIN; sp = 0; + + /* firstly we should check to see if this top-level keyring is what we +* are looking for */ + key_ref = ERR_PTR(-EAGAIN); + kflags = keyring-flags; + if (keyring-type == type match(keyring, description)) { + key = keyring; + + /* check it isn't negative and hasn't expired or been +* revoked */ + if (kflags (1 KEY_FLAG_REVOKED)) + goto error_2; + if (key-expiry now.tv_sec = key-expiry) + goto error_2; + key_ref = ERR_PTR(-ENOKEY); + if (kflags (1 KEY_FLAG_NEGATIVE)) + goto error_2; + goto found; + } + + /* otherwise, the top keyring must not be revoked, expired, or +* negatively instantiated if we are to search it */ + key_ref = ERR_PTR(-EAGAIN); + if (kflags ((1 KEY_FLAG_REVOKED) | (1 KEY_FLAG_NEGATIVE)) || + (keyring-expiry now.tv_sec = keyring-expiry)) + goto error_2; /* start processing a new keyring */ descend: @@ -331,13 +357,14 @@ descend: /* iterate through the keys in this keyring first */ for (kix = 0; kix keylist-nkeys; kix++) { key = keylist-keys[kix]; + kflags = key-flags; /* ignore keys not of this type */ if (key-type != type) continue; /* skip revoked keys and expired keys */ - if (test_bit(KEY_FLAG_REVOKED, key-flags)) + if (kflags (1 KEY_FLAG_REVOKED)) continue; if (key-expiry now.tv_sec = key-expiry) @@ -352,8 +379,8 @@ descend: context, KEY_SEARCH) 0) continue; - /* we set a different error code if we find a negative key */ - if (test_bit(KEY_FLAG_NEGATIVE, key-flags)) { + /* we set a different error code if we pass a negative key */ + if (kflags (1 KEY_FLAG_NEGATIVE)) { err = -ENOKEY; continue; } - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/27] KEYS: Allow the callout data to be passed as a blob rather than a string
Allow the callout data to be passed as a blob rather than a string for internal kernel services that call any request_key_*() interface other than request_key(). request_key() itself still takes a NUL-terminated string. The functions that change are: request_key_with_auxdata() request_key_async() request_key_async_with_auxdata() Signed-off-by: David Howells [EMAIL PROTECTED] --- Documentation/keys-request-key.txt | 11 +--- Documentation/keys.txt | 14 +++--- include/linux/key.h|9 --- security/keys/internal.h |9 --- security/keys/keyctl.c |7 - security/keys/request_key.c| 49 ++-- security/keys/request_key_auth.c | 12 + 7 files changed, 70 insertions(+), 41 deletions(-) diff --git a/Documentation/keys-request-key.txt b/Documentation/keys-request-key.txt index 266955d..09b55e4 100644 --- a/Documentation/keys-request-key.txt +++ b/Documentation/keys-request-key.txt @@ -11,26 +11,29 @@ request_key*(): struct key *request_key(const struct key_type *type, const char *description, - const char *callout_string); + const char *callout_info); or: struct key *request_key_with_auxdata(const struct key_type *type, const char *description, -const char *callout_string, +const char *callout_info, +size_t callout_len, void *aux); or: struct key *request_key_async(const struct key_type *type, const char *description, - const char *callout_string); + const char *callout_info, + size_t callout_len); or: struct key *request_key_async_with_auxdata(const struct key_type *type, const char *description, - const char *callout_string, + const char *callout_info, + size_t callout_len, void *aux); Or by userspace invoking the request_key system call: diff --git a/Documentation/keys.txt b/Documentation/keys.txt index 51652d3..b82d38d 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt @@ -771,7 +771,7 @@ payload contents for more information. struct key *request_key(const struct key_type *type, const char *description, - const char *callout_string); + const char *callout_info); This is used to request a key or keyring with a description that matches the description specified according to the key type's match function. This @@ -793,24 +793,28 @@ payload contents for more information. struct key *request_key_with_auxdata(const struct key_type *type, const char *description, -const char *callout_string, +const void *callout_info, +size_t callout_len, void *aux); This is identical to request_key(), except that the auxiliary data is -passed to the key_type-request_key() op if it exists. +passed to the key_type-request_key() op if it exists, and the callout_info +is a blob of length callout_len, if given (the length may be 0). (*) A key can be requested asynchronously by calling one of: struct key *request_key_async(const struct key_type *type, const char *description, - const char *callout_string); + const void *callout_info, + size_t callout_len); or: struct key *request_key_async_with_auxdata(const struct key_type *type, const char *description, - const char *callout_string, + const char *callout_info, + size_t callout_len, void *aux); which are asynchronous equivalents of request_key() and diff --git a/include/linux/key.h b/include/linux/key.h index a70b8a8..163f864 100644 --- a/include/linux/key.h +++ b/include/linux
[PATCH 05/27] Security: Change current-fs[ug]id to current_fs[ug]id()
Change current-fs[ug]id to current_fs[ug]id() so that fsgid and fsuid can be separated from the task_struct. Signed-off-by: David Howells [EMAIL PROTECTED] --- arch/ia64/kernel/perfmon.c|4 ++-- arch/powerpc/platforms/cell/spufs/inode.c |4 ++-- drivers/isdn/capi/capifs.c|4 ++-- drivers/usb/core/inode.c |4 ++-- fs/9p/fid.c |2 +- fs/9p/vfs_inode.c |4 ++-- fs/9p/vfs_super.c |4 ++-- fs/affs/inode.c |4 ++-- fs/anon_inodes.c |4 ++-- fs/attr.c |4 ++-- fs/bfs/dir.c |4 ++-- fs/cifs/cifsproto.h |2 +- fs/cifs/dir.c | 12 ++-- fs/cifs/inode.c |8 fs/cifs/misc.c|4 ++-- fs/coda/cache.c |6 +++--- fs/coda/upcall.c |4 ++-- fs/devpts/inode.c |4 ++-- fs/dquot.c|2 +- fs/exec.c |4 ++-- fs/ext2/balloc.c |2 +- fs/ext2/ialloc.c |4 ++-- fs/ext2/ioctl.c |2 +- fs/ext3/balloc.c |2 +- fs/ext3/ialloc.c |4 ++-- fs/ext4/balloc.c |2 +- fs/ext4/ialloc.c |4 ++-- fs/fuse/dev.c |4 ++-- fs/gfs2/inode.c | 10 +- fs/hfs/inode.c|4 ++-- fs/hfsplus/inode.c|4 ++-- fs/hpfs/namei.c | 24 fs/hugetlbfs/inode.c | 16 fs/jffs2/fs.c |4 ++-- fs/jfs/jfs_inode.c|4 ++-- fs/locks.c|2 +- fs/minix/bitmap.c |4 ++-- fs/namei.c|8 fs/nfsd/vfs.c |4 ++-- fs/ocfs2/dlm/dlmfs.c |8 fs/ocfs2/namei.c |4 ++-- fs/pipe.c |4 ++-- fs/posix_acl.c|4 ++-- fs/ramfs/inode.c |4 ++-- fs/reiserfs/namei.c |4 ++-- fs/sysv/ialloc.c |4 ++-- fs/udf/ialloc.c |4 ++-- fs/udf/namei.c|2 +- fs/ufs/ialloc.c |4 ++-- fs/xfs/linux-2.6/xfs_linux.h |4 ++-- fs/xfs/xfs_acl.c |6 +++--- fs/xfs/xfs_attr.c |2 +- fs/xfs/xfs_inode.c|6 +++--- fs/xfs/xfs_vnodeops.c |8 include/linux/fs.h|2 +- include/linux/sched.h |3 +++ ipc/mqueue.c |4 ++-- kernel/cgroup.c |4 ++-- mm/shmem.c|8 net/9p/client.c |2 +- net/socket.c |4 ++-- net/sunrpc/auth.c |8 security/commoncap.c |8 security/keys/key.c |2 +- security/keys/keyctl.c|2 +- security/keys/request_key.c | 10 +- security/keys/request_key_auth.c |2 +- 67 files changed, 163 insertions(+), 160 deletions(-) diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c index 73e7c2e..ef383d9 100644 --- a/arch/ia64/kernel/perfmon.c +++ b/arch/ia64/kernel/perfmon.c @@ -2206,8 +2206,8 @@ pfm_alloc_fd(struct file **cfile) DPRINT((new inode ino=%ld @%p\n, inode-i_ino, inode)); inode-i_mode = S_IFCHR|S_IRUGO; - inode-i_uid = current-fsuid; - inode-i_gid = current-fsgid; + inode-i_uid = current_fsuid(); + inode-i_gid = current_fsgid(); sprintf(name, [%lu], inode-i_ino); this.name = name; diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c index c0e968a..4efe7bf 100644 --- a/arch/powerpc/platforms/cell/spufs/inode.c +++ b/arch/powerpc/platforms/cell/spufs/inode.c @@ -85,8 +85,8 @@ spufs_new_inode(struct super_block *sb, int mode) goto out; inode-i_mode = mode; - inode-i_uid = current-fsuid; - inode-i_gid = current-fsgid; + inode-i_uid = current_fsuid(); + inode-i_gid