On Wed, Feb 5, 2014 at 2:31 AM, Glauber Costa
<[email protected]>wrote:

> Hi
>
> I've been recently trying to devise some mechanism to reuse the ARC
> buffers directly into a file back mapping (created by mmap, for instance).
> My main goal is not to have duplication between what is in the ARC and what
> is in the page cache (and to be honest, the OS I am working on does not
> have a page cache, so my real goal is to keep it this way).
>
> It seems like Solaris and BSD never did that, but I could not find any
> indication about the why.
> That's the kind of thing I am pretty sure was thought about before, so I
> wonder if the lack of an implementation like that is due to a major
> showstopper found by you guys.
>
> So before I dive too deeply into this, can anybody advise me on this?
>
>
As I recall, the main reason we kept the ZFS cache separate from the page
cache was to avoid complexity related to the different locking models.  If
you are designing mmap from scratch, I imagine you could avoid that.

Read-only mmap should be relatively straightforward.  When the page is
faulted in you can just keep the dbuf (dmu_buf_impl_t) held, so that it
stays in memory, and then find its page_t and map it into the process's
address space.

Write-back mmap (i.e. PROT_WRITE + MAP_SHARED) will be trickier, because
you can't modify the page while the dbuf is being written (due to
checksums, raid-z, etc).  Nor can you have transactions of indefinite
length (e.g. create transaction when page is first stored to, commit it
when pageout gets around to flushing it).  I guess you could do something
like mark the dbuf dirty and then when syncing context (dbuf_sync_leaf())
gets around to writing it, copy the data from the page to a new arc_buf
that's just only while writing it out.

--matt
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to