On Wed, Sep 14, 2016 at 10:29:33PM -0700, Darrick J. Wong wrote:
> I'm a little confused about xfs_file_iomap_begin() and IOMAP_WRITE --
> prior to this patchset, it was only called via page_mkwrite and what
> used to be write_begin, and all it did was create a delalloc reservation
> to back the write (or actually allocate blocks if extsize was set).

No, it was called from write itself in addition, of course.

> Thinking ahead to integrating reflink with DAX (and presumably
> directio) in both cases _file_iomap_begin has to return a real extent.
> If the range we're writing is shared, then that means that I'd have to
> ensure the range is reserved in the COW fork (the reflink patches already do
> this).  Next, if the caller requires a firm mapping (i.e. dax or dio)
> I'd have to allocate the range in the COW fork and have that mapping
> returned to the caller.

Yes.  No different than the existing buffered write case with an
extent size hint, really.

> So I guess it would look something like this:
> 
> /* reserve reflink cow range like the reflink patches already do */
> if (flags & (IOMAP_WRITE | IOMAP_ZERO) && xfs_is_reflink_inode(ip))
>       xfs_reflink_reserve_cow_range(ip, offset, length);

For zeroing we don't need to reserve anything IFF we are finding a hole,
so this is too early.

> /* do the cow thing if need be */
> if ((flags & IOMAP_WRITE) &&
>     xfs_is_reflink_inode(ip) &&
>     (IS_DAX(inode) || (flags & IOMAP_DIO)) {
>       xfs_reflink_allocate_cow_range(ip, offset, length);
> 
>       if (xfs_reflink_is_cow_pending(ip, offset))
>               xfs_reflink_find_cow_mapping(ip, offset, &imap, &need_alloc);
>       else
>               xfs_bmapi_read(offset, &imap)
> } else
>       xfs_bmapi_read(offset, &imap) /* non-cow io */


We really have three write block allocation variants, of which the first
two look basically the same except for I/O completion handling:

 1) For DAX we need to allocate real extents and zero them before use.
 2) For buffered I/O with an extent size hint or direct I/O we need to
    allocate unwritten extents (and convert after the I/O succeeded for
    the range that succeeded)
 3) for buffered I/O without and extent size hint we need to created
    a delayed allocation

That logic exists both in the old __xfs_get_blocks and and the new
iomap code (although we don't check for the dio case yet).

So I don't think we should need to change anything in terms of COW
in relation to DAX or direct I/O here.  I do however need to fully
audit the usage of the reflink calls for the cases 1 and 2.  For one
it doesn't really seem nessecary to split the reserve/allocate case
ehre, and I hope that a lot of the bmap btree looks could be saved,
similar to my rewrite of the delalloc case.
    

> Does that look plausible?  If _iomap_begin will have one type of caller
> that only needs a delalloc reservation and another type that actually
> needs a firm mapping, should we make a new IOMAP flag to indicate that
> the caller really needs a firm mapping?  IS_DAX is fine for detecting
> the DAX case as this patchset shows, but what about directio?
> 
> (At this point Dave suggests on IRC that we just make a dio specific
> iomap_begin.  That eliminates the flag, though it'd be a fairly similar
> function.)

As explained above, DIO is exaxtly the same as buffered I/O with
an extent size hint.  We already handle it fine.  DAX is almost
the same, keyed off a DAX check in lower level code to convert unwritten
extents and zero the blocks.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Reply via email to