On Sun, Apr 30, 2000 at 04:46:37AM -0400, Erez Zadok wrote:
> Background: my stacking code for linux is minimal. I only stack on things I
> absolutely have to. By "stack on" I mean that I save a link/pointer to a
> lower-level object in the private data field of an upper-level object. I do
> so for struct file, inode, dentry, etc. But I do NOT stack on pages. Doing
> so would complicate stacking considerably. So far I was able to avoid this
> b/c every function that deals with pages also passes a struct file/dentry to
> it so I can find the correct lower page.
You shouldn't need to "stack on" pages anyway, I wouldn't have thought.
For each page you can reference mapping->host, which should point to the
hosting structure (at the moment always an inode, but this may change).
> The new method, sync_page() is only passed a struct page. So I cannot stack
> on it! If I have to stack on it, I'll have to either
>
> (1) complicate my stacking code considerably by stacking on pages. This is
> impossible for my stackable compression file system, b/c the mapping of
> upper and lower pages is not 1-to-1.
Why can your sync_page implementation not grab the inode from mapping->host
and then call sync_page on the underlying fs' page(s) that hold the data?
> (2) change the kernel so that every instance of sync_page is passed the
> corresponding struct file. This isn't pretty either.
I'd like to the see the address_space methods /lose/ the struct file /
struct dentry pointer, but it may be there are situations which require
it.
> Luckily, sync_page isn't used too much. Only nfs seems to use it at the
> moment. All other file systems which define ->sync_page use
> block_sync_page() which is defined as:
>
> int block_sync_page(struct page *page)
> {
> run_task_queue(&tq_disk);
> return 0;
> }
>
> This is confusing. Why would block_sync_page ignore the page argument and
> call something else. The name "block_sync_page" might be misleading. The
> only thing I can think of is that block_sync_page is a placeholder for for a
> time when it would actually do something with the page.
No, this looks OK to me. sync_page seems to be designed to ensure that any
async I/O on a page is complete. For normal "block-mapped" filesystems (which
is what the block_* helpers in buffer.c are for), pages with "in flight" I/O
will have requests on the appropriate block device's queue. Running tq_disk
should ensure these are completed. If tq_disk is ever split up (I believe Jens
Axboe is working on this), block_sync_page will need to look at
page->mapping->host to determine the device, I assume.
Non block-mapped filesystems (e.g. nfs) will need to do something different.
> Anyway, since sync_page appears to be an optional function, I've tried my
> stacking without defining my own ->sync_page. Preliminary results show it
> seems to work. However, if at any point I'd have to define ->sync_page page
> and have to call the lower file system's ->sync_page, I'd urge a change in
> the prototype of this method that would make it possible for me to stack
> this operation.
>
> Also, I don't understand what's ->sync_page for in the first place. The
> name of the fxn implies it might be something like a commit_write.
I presume it's been added to replace explicit tq_disk runs, which will allow
finer-grained syncing as described above, and also to ensure correctness in
nfs and other similar filesystems.