On Wed, Feb 5, 2014 at 12:04 PM, Glauber Costa <[email protected]
> wrote:

>
>
>
> On Wed, Feb 5, 2014 at 10:03 PM, Matthew Ahrens <[email protected]>wrote:
>
>> On Wed, Feb 5, 2014 at 2:31 AM, Glauber Costa <
>> [email protected]> wrote:
>>
>>> Hi
>>>
>>> I've been recently trying to devise some mechanism to reuse the ARC
>>> buffers directly into a file back mapping (created by mmap, for instance).
>>> My main goal is not to have duplication between what is in the ARC and what
>>> is in the page cache (and to be honest, the OS I am working on does not
>>> have a page cache, so my real goal is to keep it this way).
>>>
>>> It seems like Solaris and BSD never did that, but I could not find any
>>> indication about the why.
>>> That's the kind of thing I am pretty sure was thought about before, so I
>>> wonder if the lack of an implementation like that is due to a major
>>> showstopper found by you guys.
>>>
>>> So before I dive too deeply into this, can anybody advise me on this?
>>>
>>>
>> Thanks
>
>
>> As I recall, the main reason we kept the ZFS cache separate from the page
>> cache was to avoid complexity related to the different locking models.  If
>> you are designing mmap from scratch, I imagine you could avoid that.
>>
>
> Most definitely. This does not exist yet for me outside of a paper sheet,
> but as I am evolving with it, one of the main problems I am foreseeing is
> that every ARC access is intermediated by a read or write operation that
> can call arc_access in a very well defined point. If this buffer is used
> outside of the ARC, those accesses won't exist. Specially for mmap, that
> information is held in the processor acessed / dirty bits (for the case of
> moving to anonymous), and periodically synchronizing the whole memory can
> be prohibitive - although I am talking dozens Gbs of memory here, hundreds
> is a bit out of our scope.
>

I was imagining that the dbuf is held as long as the page is mapped, and
thus the arc_buf is not evictable.  So you wouldn't need to worry about the
accessed bits.  Maybe just call arc_access() when it is unmapped.  Handling
the dirty bit is more complicated.


>
> Did you guys give this any previous thought ?
>
> My proposed solution so far is to every time a new page in inserted into
> the cache, verify the bits in the page that it would dislodge and update
> accordingly if needed. Same goes for eviction, especially eviction to ghost
> list.
>
>
>
>> Read-only mmap should be relatively straightforward.  When the page is
>> faulted in you can just keep the dbuf (dmu_buf_impl_t) held, so that it
>> stays in memory, and then find its page_t and map it into the process's
>> address space.
>>
>
> Actually, I don't want it to stay in memory. One of the big wins for me to
> do this is to be able to re-use ZFS's paging policy instead of implementing
> our own. We only ever want to do paging for file back shared mappings, so
> we have no other kind of paging. What I intend to do is to have ZFS to tell
> the OS when the page is about to be taken out from the cache, and then
> allow me to get rid of the present bit. But that still seems doable given
> what you said, as long as I do all that updates with the buffer still held
> (doesn't seem a problem)
>
>
>>
>> Write-back mmap (i.e. PROT_WRITE + MAP_SHARED) will be trickier, because
>> you can't modify the page while the dbuf is being written (due to
>> checksums, raid-z, etc).  Nor can you have transactions of indefinite
>> length (e.g. create transaction when page is first stored to, commit it
>> when pageout gets around to flushing it).  I guess you could do something
>> like mark the dbuf dirty and then when syncing context (dbuf_sync_leaf())
>> gets around to writing it, copy the data from the page to a new arc_buf
>> that's just only while writing it out.
>>
>> Last case I can COW on write and implement a writeback mechanism that
> allows me to simplify it to keep the mapping with clean pages only (re-read
> them upon writeback completion). But that would be less desirable.
>
> Is that limitation about not being able to modify the dbuf only valid for
> the period in which IO is initiated? That seems possible to overcome as
> well, although we'd be already out of the trivial zone.
>

I think you would want to do something like: in dbuf_sync_leaf(), mark the
page clean, then copy the data from the page to a new arc_buf, and send
that arc_buf down to zio_write().  When the i/o is done you can free the
arc_buf.  This could be several seconds, but you should be able to continue
modifying the page while this is happening (because we made the copy).

--matt
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to