On Tue, Mar 21, 2017 at 02:30:38AM +0000, Rick Macklem wrote:
> Konstantin Belousov wrote:
> [stuff snipped]
> > Yes, I have to somewhat retract my claims, but then I have another set
> > of surprises.
> > I realized (remembered) that nfs has its own VOP_PUTPAGES() method.
> > Implementation seems to directly initiate write RPC request using the
> > pages as the source buffer. I do not see anything in the code which
> > would mark the buffers, which possibly contain the pages, as clean,
> > or mark the buffer range as undirty.
> The only place I know of that the code does this is in the "struct buf's"
> hanging off of v_bufobj.bo_dirty.
> I imagine there would be a race between the write-back to the NFS server
> and further changes to the page by the process. For the most part, the
> VOP_PUTPAGES() is likely to happen after the process is done modifying
> the pages (often exited). For cases where it happens sooner, I would expect
> the page(s) to be written multiple times, but the last write should bring
> the file "up to date" on the server.
> > At very least, this might cause unnecessary double-write of the same
> > data. I am not sure if it could cause coherency issues between data
> > written using mappings and write(2). Also, both vm_object_page_clean()
> > and vfs_busy_pages() only ensure the shared-busy state on the pages,
> > so write(2) can occur while pageout sends data to server, causing
> > 'impossible' content transmitted over the wire.
> I'm not sure what you mean by impossible content, but for NFS the only
> time the file on the NFS server should be "up to date" will be after a file
> doing write(2) writing has closed the fd (and only then if options like
> "nocto" has not been used) or after an fsync(2) done by the process
> doing the writing.
By 'impossible' I mean some arbitrary combination of bytes which were
written by many means to the file at arbitrary moments. In other words,
the file content, or even a single page/block content is not atomic
WRT the client updates.
> For mmap'd writing, I think msync(2) is about the only
> thing the process can do to ensure the data is written back to the server.
> (There was a patch to the NFS VOP_CLOSE() that does a vm_object_page_clean()
> but without the OBJPC_SYNC flag which tries to make sure the pages get
> shortly after the file is closed. Of course, an mmap'd file can still be
> modified by the
> process after close(2), so it is "just an attempt to make the common case
> I don't recall, but I don't think I was the author of this patch.)
> I also wouldn't be surprised that multiple writes of the same page(s) occurs
> under certain situations. (NFS has no rules w.r.t. write ordering. Each RPC is
> separate and simply writes N bytes at file offset S.) It definitely happens
> there are multiple write(2)s of partial buffers, depending on when a sync()
> > Could you, please, explain the reasons for such implementation of
> > ncl_putpage() ?
> Well, I wasn't the author (it was just cribbed from the old NFS client and I
> know who wrote it), so I'm afraid I don't know. (It's code I avoid changing
> because I don't
> really understand it.)
> I suspect that the author assumed that processes would either mmap the file
> or use write(2) and wouldn't ever try and mix them to-gether.
Or, what seems more likely to me, the code was written on a system where
buffer cache and page queues are not coherent.
Anyway, my position is that nfs VOP_PUTPAGES() should do write through
buffer cache, not issuing the direct rpc call with the pages as source.
Then your patch would need an update with the mentioned call to ncl_flush().
email@example.com mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"