Re: VFS not completely factored, and more

Michael W Zappe Mon, 26 Jun 2000 15:30:05 -0700

I tried mailing this once to linux-fsdev allready, but it never showed back up on my 
machine, so sorry if it's a repeat.

        Mike

----------------------------------------------
Michael Zappe            <[EMAIL PROTECTED]>

Chief Architect, Filesystems
Interlan Communications 
111 Corning Drive
Cary, NC 27511

Thanks for writing back!  It's good to meet you also, Daniel.  So what projects are 
you working on outside of the VFS if you don't mind me asking?  My main project, and 
hence my interest in the VFS is yet another journaling filesystem, CXFS.  (originally 
named Charon, but we discovered two companies warring over the trademark, and didn't 
want to touch that with a 40 foot pole... ;-)  I plan on open sourcing it soon, but 
right now i'm keeping it within the company just to remove the management headaches of 
a distributed project, and so i have the freedom to tweak it at will without making 
peoples lives difficult.  ;-)

But in terms of the VFS problems, I have to say that your idea is a good one, since it 
keeps the data local to the inode!  The only issue i'd have with it is the way you do 
the define.  I think a much cleaner solution would be to have each fs define a 
structure

struct ext2_inode
{
        struct inode ei_inode;
        struct ext2_inode_info ei_ext2_info;
};

And just clean up all of the other filesystems to use a more generic mechanism to cast 
it, such as:

#define EXT2_INODE_INFO(inode) (((struct ext2_inode *)(inode))->ei_ext2_info)

This would also mean that if anything changed, we have an inexpensive (non-existent at 
runtime) level of abstraction which would allow for a simple change.  I also use a 
similar mechanism, and it works great for portability.  I'd definately be willing to 
help clean up the FS code to use the new interface.

One issue that this brings up, and as Viro pointed out, is finding a good eviction 
policy.  Since inodes can now have different allocation sizes, we should try and make 
sure that that doesn't lead to unfairness.  I don't think it will be too big a deal, 
since the other resources associated with an inode are typically MUCH more expensive 
than the inode itself, which will probably almost always be < 4K.  Which is the 
current size of the inode anyway, so even without changing the eviction policy we will 
probably still be ahead.  But we may be able to add optimizations later along this 
line.

I also wouldn't mind adding changing the iget code to a new iget2 which takes an 
opaque, fs dependent parameter (void *).  We would also split the functions into 

void read_inode(struct inode *, void *);
void get_inode(struct inode *, void *);

Read inode is called only on an uninitialized inode structure, and get_inode whenever 
iget pulls an inode out of the cache.  This way, if the FS isn't concerned with 
additional reads after the first, the pointer can just be NULL and we optimize out the 
call.

Also, i wouldn't mind changing i_ino to be a __u64.  Anyone see any major problems 
with this?  We'd have to change the hashing code, (which wouldn't be a bad idea 
anyway) but i think most everything else will convert fairly cleanly.  I have to look 
at this change a little more closely, but it seems to be a good idea.

So, any ideas on how to divy up the work/where to start?

        Mike

On Mon, Jun 26, 2000 at 12:12:50PM +0200, Daniel Phillips wrote:
> 
> 
> ----------  Forwarded Message  ----------
> Subject: Re: VFS not completely factored
> Date: Tue, 15 Feb 2000 22:23:00 +0100
> From: Daniel Phillips <[EMAIL PROTECTED]>
> 
> 
> On Tue, 15 Feb 2000, James Manning wrote:
> > [ Monday, February 14, 2000 ] [EMAIL PROTECTED] wrote:
> > > On a related issue, if there was a char generic_data[] member in the inode
> > > union that had a defined size (probably the size of the largest structure),
> > > people (like me) writing filesystems that aren't ready to be put into the
> > > sources yet could put data right in the union and know that it will fit
> > > without having to resort to using generic_ip and allocating an info block
> > > to store inode specific info.  (Narly sentence huh?)
> > 
> > That would seem to place a restriction on the size of your inode data,
> > something generic_ip frees you from (at not a high cost imho). I'd
> > go ahead and advocate everyone having to go through the indirection
> > and rip out the entire union for a void *inode_info first (helping a
> > good bit towards freeing vfs from specific fs's).  Then do the same
> > for the super_block struct and you seem to be well on your way to the
> > factored vfs, as you'll be rid of fs-specific includes in linux/fs.h
> > (attached).  Whether the indirection and sacrificed cache line is worth
> > it is the call of another.
> 
> The problem comes in the first place from trying to avoid the extra
> de-reference to get to the filesystem-specific data.  This was successfully
> done, but at great expense in increased source code complexity and also
> by sacrificing the idea that the vfs should be properly separated from the
> filesystems it supports.  I'd like to suggest a relatively small change
> that would still save the extra dereference, but without tangling the fs 
> headers in knots.
> 
> What I was actually thinking of is adding a couple of extra integer fields 
> onto the end of the file_system_type struct, giving the size of the 
> filesystem-specific data required for the superblocks and inodes:
> 
>       struct file_system_type {
>               const char *name;
>               int fs_flags;
>               struct super_block *(*read_super) (struct super_block *, void *, int);
>               struct file_system_type * next;
> +             int fs_specific_size_super;
> +             int fs_specific_size_inode;
>       };
> 
> For ext2, the file_system_type initializor would be:
> 
>       static struct file_system_type ext2_fs_type = {
>               "ext2", 
>               FS_REQUIRES_DEV /* | FS_IBASKET */,     /* ibaskets have unresolved 
>bugs */
>               ext2_read_super, 
>               NULL,
>               sizeof(ext2_sb_info),
>               sizeof(ext2_inode_info)
>       };
> 
> Macro symbols would be substituted for the unions in fs.h:
> 
>       struct super_block {
>               struct list_head        s_list;
>               kdev_t                  s_dev;
>               ...
>               struct list_head        s_dirty;        /* dirty inodes */
>               struct semaphore s_vfs_rename_sem;      /* Kludge */
>               FS_SPECIFIC_SUPERBLOCK_INFO
>       };
> 
> To compile ext2 fs you use the following defines before including fs.h:
> 
>       #define FS_SPECIFIC_SUPER_INFO union {struct ext2_sb_info ext2_sb;} u;
>       #define FS_SPECIFIC_INODE_INFO union {struct ext2_inode_info ext2_i;} u;
> 
> All the changes required are trivial, both in the vfs code and in the specific
> filesystems.  In fact, it would be easy to avoid changing the specific 
> filesystems at all if that were desired, though I don't see why that would be 
> good.
> 
> This strategy would allow all the file systems to compile properly, with only 
> trivial changes, and it would allow the file system headers to be untangled a 
> great deal.  Furthermore, each filesystem maintainer the choice of continuing 
> to use the (awkward) sb->u.fs_specific_struct.actual_field syntax, or getting 
> rid of it completely in the interest of improving the readability of the code.
> 
> Not only readability, but generality as well is being held back by the current
> arrangement.
> 
> -- 
> Daniel
> -------------------------------------------------------
> 
> -- 
> Daniel
----------------------------------------------
Michael Zappe            <[EMAIL PROTECTED]>

Chief Architect, Filesystems
Interlan Communications 
111 Corning Drive
Cary, NC 27511

Re: VFS not completely factored, and more

Reply via email to