If you squeeze out every byte won't you still have a short
write? And the written data wouldn't be cut at the bad
place, but it would have a weird hole or discontinuity there.

-Mike

On Wed, Sep 14, 2016 at 5:34 PM, Al Viro <v...@zeniv.linux.org.uk> wrote:
>         Right now writev() with 3-iovec array that has unmapped address in
> the second element and total length less than PAGE_SIZE will write the
> first segment and stop at that.  Among other things, it guarantees the
> short copy, and I would rather have it yeild 0-bytes write (and -EFAULT as
> return value).
>
>         All POSIX has to say about that is this (in 2.3 Error Numbers):
>
> [EFAULT]
>     Bad address. The system detected an invalid address in attempting to use
> an argument of a call. The reliable detection of this error cannot be
> guaranteed, and when not detected may result in the generation of a signal,
> indicating an address violation, which is sent to the process.
>
> Note that unmapped page in the middle of a range covered already can lead to
> the same kind of short write  - i.e. if we have
>         p = mmap(0, 3*4096, PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
>         munmap(p + 4096, 4096);
>         fd = open("/tmp/foo", O_CREAT|O_TRUNC|O_RDWR, 0777);
>         write(fd, p + 2048, 8192);
>
> write() will yield -EFAULT, not a 2Kb stored.  The same will happen with
>         writev(fd, &(struct iovec){p + 2048, 8192}, 1);
> BTW, adding lseek(fd, 2049, SEEK_SET); before that write (or writev) will
> result in 2047 bytes being written by the latter.
>
> IOW, we do not try to squeeze every byte that can be squeezed out of the
> buffer; generally, an unmapped address anywhere in PAGE_SIZE worth of data
> that would go into the same page-aligned chunk of destination can result in
> short write cut at the beginning of that chunk.  iovec boundaries act
> as barriers to short writes, mostly by accident.
>
> Do we need to preserve that special treatment of iovec boundaries?  I would
> really like to get rid of that - the current behaviour is an easy and reliable
> way to trigger a short copy case in ->write_end() and those are fairly
> brittle.  Sure, we still need to cope with them, and I think I've got all
> instances in the current mainline fixed, but they are often suboptimal.
>
> Objections?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to