Re: [Charon-dev] Re: VFS not completely factored, and more

Michael W Zappe Mon, 17 Jul 2000 13:16:59 -0700
On Sat, Jul 01, 2000 at 02:14:45PM -0400, Alexander Viro wrote:
> 
> 
> [reformatted]
> On Mon, 26 Jun 2000, Michael W Zappe wrote:
> 
> [snip]
> > filesystem, CXFS.  (originally named Charon, but we discovered two
> > companies warring over the trademark, and didn't want to touch that with
> > a 40 foot pole... ;-)
> 
> Heh. XFS folks mentioned clustered variant of their puppy. Yup, also
> CXFS...
> 

Damnit!  I had somehow missed the announcements when searching for it.  Argh, time for 
yet another name change. ;-)

> > But in terms of the VFS problems, I have to say that your idea is a good
> > one, since it keeps the data local to the inode!  The only issue i'd
> >have with it is the way you do the define.  I think a much cleaner
> >solution would be to have each fs define a structure
> > 
> > struct ext2_inode
> > {
> >     struct inode ei_inode;
> >     struct ext2_inode_info ei_ext2_info;
> > };
> > 
> > And just clean up all of the other filesystems to use a more generic
> >mechanism to cast it, such as:
> > 
> > #define EXT2_INODE_INFO(inode) (((struct ext2_inode *)(inode))->ei_ext2_info)
> 
> Umm... I don't like how it sounds. Basically, it means that iget() goes
> to hell - that way allocation has to be done in fs. May be a good thing,
> but we are not there yet.
>

What do you mean by "we are not there yet?"  I thought the point of this exercise was 
to get us there! ;-)
 
> > This would also mean that if anything changed, we have an inexpensive
> > (non-existent at runtime) level of abstraction which would allow for a
> > simple change.  I also use a similar mechanism, and it works great for
> > portability.  I'd definately be willing to help clean up the FS code to
> > use the new interface.
> 
> Could you describe that new interface in some details? I don't see how you
> deal with knfsd and with allocation/freeing policy. Another thing to watch
> for: init_once()-type animals.
> 

Allocation and freeing can be done in the read_inode, put_inode pairs.  The extra data 
that a filesystem needs is persistent for the lifetime of the inode in the icache, 
unless a filesystem wished to do otherwise.

What do you mean exactly by an init_once type animal?  If anything, read_inode() 
certainly seems to fall under that category.

> > I also wouldn't mind adding changing the iget code to a new iget2 which
> > takes an opaque, fs dependent parameter (void *).  We would also split
> 
> Not funny. First of all, you have to deal with callers in knfsd. If we
> kill them (fh<->dentry patches) there is no reason to keep iget() and
> ->read_inode() at all. Otherwise... Where the heck will knfsd get your
> void * cookie?
> 

Good point about knfsd.  I never really looked at it or compatability issues with my 
FS.  What a horrible idea from the point of view of keeping a good layer of 
abstraction.  The easiest answer is that knfsd passes NULL, and if a filesystem 
requires the additional cookie, then it doesn't work with knfsd.  We should probably 
have a flag (FS_REQUIRES_COOKIE -- the cookie monster flag? ;-) in the fs_flags so it 
doesn't try to export it.  Wow, talk about spagetti code...

So i guess the goal of the changes was to make NFS work super-fast, at the expense of 
extensibility?  (Not meant sarcastically, it's a real question.)

> > the functions into 
> > 
> > void read_inode(struct inode *, void *);
> > void get_inode(struct inode *, void *);
> > 
> > Read inode is called only on an uninitialized inode structure, and
> > get_inode whenever iget pulls an inode out of the cache.  This way, if
> > the FS isn't concerned with additional reads after the first, the
> > pointer can just be NULL and we optimize out the call.
> 
> ??? That's something new - which filesystems need to do reread upon the
> icache hit?
> 

Not neccicarily a re-read, but possibly a transaction context, or to do a revalidation 
if the filesystem is acting as a cache, and generally let the fs know that it has one 
more refferer.

> > Also, i wouldn't mind changing i_ino to be a __u64.  Anyone see any
> > major problems with this?  We'd have to change the hashing code, (which
> > wouldn't be a bad idea anyway) but i think most everything else will
> > convert fairly cleanly.  I have to look at this change a little more
> > closely, but it seems to be a good idea.
> 
> No problem, except that you are getting an overhead on comparisons in the
> cache. My gut feeling is that it's not going to be critical, but there's
> only one way to figure out... Keep in mind that x86 is register-starved,
> so anything 64-bit is a strain on compiler. I suspect that it may turn
> into worse overhead than anything coming from separate allocation of
> per-fs part.

Not really.  Yes, the x86 is register starved, but my experience with using 64 bits in 
the kernel is that the overhead isn't too bad as long as you stick to adds and shifts, 
which is all most fileystems need anyway.  Is it also the case that someone has gotten 
MMX working in the kernel (i believe in the RAID checksumming/parity code)?  Using 
that, we could get '1/2 cycle' 64 bit math + 8 extra registers on the Pentium MMX and 
on.  Not to mention, the benifits of having truly 64-bit filesystems supported without 
big kludges would be nice for moving linux into the HA/Data Center area.  The only 
problem is that the code would probably have __add_u64(a, b) rather than a+b, and have 
to add prefix and suffix code around alot of the icache functions.  (Or beware the 
great no emms penalty...)  Is it also generally assumend by the compiler that any 
function call can trash FPU state, yes?  (The only problem with this that i can 
see...) 

The dissapearance of the BITS_PER_LONG < 64 ifdef's in my code would be nice also... 
;-)

        Mike


----------------------------------------------
Michael Zappe            <[EMAIL PROTECTED]>

Chief Architect, Filesystems
Interlan Communications 
111 Corning Drive
Cary, NC 27511
Re: [Charon-dev] Re: VFS not completely factored, and more

Reply via email to