Hi,

I see the scope  of the discussion here got quickly beyond the scope
of my first posting :-) Anyway, the filesystem we're implementing is a
variant of a classic log-structured filesystem which is quite similiar
to unix filesystems in many aspects (like inodes and stuff) and we
will have 0-1 (sort of) transactions so as far as this issue is
concerned our case is probably very similiar to ext3 delayed
allocation.

On 4/19/05, Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> The idea is to "reserve" a block at the prepare/commit write instead
> of allocating the block. Do the actual allocation in writepage().

Exactly.

> Here are the issues:
> ====================
> 
> 1) Currently none of the generic helper routines can handle this.
> We need to add support to do these, but still somehow make the
> routines generic enough for every ones use.

I'm quite happy about most of them. I can't see how we could use any
generic form of writepage(s) as we write stuff in a quite different
way from almost anybody else but all the others except
block_prepare_write do  pretty much exactly what we need (if I have
not missed something).

> 2) There is no easy way to find out if we "reserved" a block or
> not in writepage() correctly. There are 2 paths to writepage().
> 
>         sys_write() -> prepare/commit()
>                 and later sync() ----> writepage()
> 
>         mmap() -> touch a page()
>                 and later --> writepage()
> 
> In order to do the correct accounting, we need to mark a page
> to indicate if we reserved a block or not. One way to do this,
> to use page->private to indicate this. But then, all the generic
> routines will fail - since they assume that page->private represents
> bufferheads. So we need a better way to do this.

I didn't hope for a special bit in struct page so I wanted to simply
fake the page/buffer mapping somehow. Since we don't really care
whether a page is mapped or reserved as long as it is at least one of
these when actually writing it (we write stuff to different places
from where we have read it from), the PG_mappedtodisk is fine for us
as long as no other kernel code thinks that having it set means we
also have buffers which point to meaningful positions on the device
because we don't. Is that the case?

Of course, having a PG_RESERVED flag would be a nice and clean thing
to use and we would be more than happy to do so.

> 3) We need add hooks into filesystem specific calls from these
> generic routines to handle "journaling mode" requirements

Our fs is basically one big journal so we don't need any of these. Or
at least I don't see any need for it at the moment.

> So, what are your requirements ?  I am looking for a common
> way to combine all the requirements and come out with a
> saner "generic" routines to handle these.

I'm happy with most generic functions. we need to implement
writepage(s) ourselves no matter what, the only problem is
block_prepare_write and I can currently only see two options for us:

1) Implement it ourselves and use a flag in the struct page to mark it reserved.

2) Use block_prepare_write but enable the get_block function to mark
an individual buffer as reserved so that it is trated as mapped (can
be dirty and stuff) but no code assumes it is located somewhere on the
disk (for example block_prepare_write would not call
unmap_underlying_metadata).

I think we'll go for the first method, but the second would make life
easier for filesystems which can have pages consisting of both mapped
and reserved blocks.

Thank you very much for your reply, the whole thread has been well
worth reading.

Martin Jambor
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to