>     This is a request for a review.  This patch fixes a bug in specfs
>     relating to dealing with the EOF condition of a block device.
> 
>     If the EOF occurs in the middle of a block, specfs was not
>     properly calculating the truncation for the I/O.

I didn't have time to review this properly when you first asked.

It is a bit over-commented, and returns wrong error codes for EOF.

> Index: miscfs/specfs/spec_vnops.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/miscfs/specfs/spec_vnops.c,v
> retrieving revision 1.108
> diff -u -r1.108 spec_vnops.c
> --- spec_vnops.c      1999/09/17 06:10:26     1.108
> +++ spec_vnops.c      1999/09/20 17:50:48
> @@ -311,19 +311,37 @@
>               do {
>                       bn = btodb(uio->uio_offset) & ~(bscale - 1);
>                       on = uio->uio_offset % bsize;
> -                     n = min((unsigned)(bsize - on), uio->uio_resid);
>                       if (seqcount > 1) {
>                               nextbn = bn + bscale;
>                               error = breadn(vp, bn, (int)bsize, &nextbn,
>                                       (int *)&bsize, 1, NOCRED, &bp);
>                       } else {
>                               error = bread(vp, bn, (int)bsize, NOCRED, &bp);
> +                     }
> +
> +                     /*
> +                      * Figure out how much of the buffer is valid relative
> +                      * to our offset into the buffer, which may be negative
> +                      * if we are beyond the EOF.
> +                      *
> +                      * The valid size of the buffer is based on 
> +                      * bp->b_bcount (which may have been truncated by
> +                      * dscheck or the device) minus bp->b_resid, which
> +                      * may be indicative of an I/O error if non-zero.
> +                      */

The short read must have been caused by EOF unless error != 0.  Note that
filesystems depend on this, i.e., the following handling in VOP_STRATEGY():

    Case 1: do some i/o (possibly null), then hit an i/o error:
        Return the amount not done in b_resid, and set B_ERROR to indicate
        that i/o was truncated because of an error.
    Case 2: do some i/o (possibly null), then hit EOF:
        Return the amount not done in b_resid, and don't touch B_ERROR.
        Some drivers change b_bcount, but this is dubious, especially
        for writes where the data beyond b_bcount is still valid and
        something may be using it.

ffs handles these cases as follows:

    Case 1:
        Throw away the partial block.  Depend on VOP_STRATEGY() setting
        B_ERROR so that bread()/bwrite()/etc. return nonzero in this
        case.
    Case 2:
        Assume that this case can't happen for metadata (a reasonable
        assumption, since filesystems don't use blocks beyond EOF),
        and don't check for it.  Similarly for VOP_WRITE().  Handle it
        in VOP_READ(), although I think it can't happen there either.

specfs (for bdevs) currently handle these cases like ffs.  Short counts
are only handled for Case 2.

physio() handles the cases like VOP_STRATEGY().  It passes the distinction
bewtween EOF and error to its caller by returning nonzero only if there
was an error.  (Distinguishing the cases is especially important for
write-once devices.)  Control eventually reaches the level dofileread(),
etc., where the distinction is screwed up :-( (`error' is only reset to
0 for nonzero short counts if it is ERESTART, EINTR or EWOULDBLOCK).

> +                     n = bp->b_bcount - on;
> +                     if (n < 0) {
> +                             error = EINVAL;

`error' shouldn't be changed here if it is already nonzero.

EINVAL is a wrong value to return for EOF here (but neccessary to give
bug for bug compatibility here).  See the comment in my version.

> +                     } else {
> +                             n -= bp->b_resid;
> +                             if (n < 0)
> +                                     error = EIO;

EIO is a wronger value to return for EOF.

Here are my fixes (relative to the old version) for spec_read().  They are
over-commented to temporarily document the buggy EOF return code.

---
diff -c2 spec_vnops.c~ spec_vnops.c
*** spec_vnops.c~       Sat Sep 11 15:54:39 1999
--- spec_vnops.c        Wed Sep 22 19:47:52 1999
***************
*** 309,313 ****
                        bn = btodb(uio->uio_offset) & ~(bscale - 1);
                        on = uio->uio_offset % bsize;
-                       n = min((unsigned)(bsize - on), uio->uio_resid);
                        if (vp->v_lastr + bscale == bn) {
                                nextbn = bn + bscale;
--- 311,314 ----
***************
*** 317,325 ****
                                error = bread(vp, bn, (int)bsize, NOCRED, &bp);
                        vp->v_lastr = bn;
-                       n = min(n, bsize - bp->b_resid);
                        if (error) {
                                brelse(bp);
                                return (error);
                        }
                        error = uiomove((char *)bp->b_data + on, n, uio);
                        brelse(bp);
--- 318,343 ----
                                error = bread(vp, bn, (int)bsize, NOCRED, &bp);
                        vp->v_lastr = bn;
                        if (error) {
                                brelse(bp);
                                return (error);
                        }
+ 
+                       /*
+                        * The amount of valid data in the buffer is
+                        * bp->b_bcount - bp->b_resid.  If this is smaller
+                        * than `on', return EINVAL to give bug for bug
+                        * compatibility with physio() and with ourself
+                        * (most strategy routines erroneously return EINVAL
+                        * for i/o completely beyond EOF).  POSIX and POLA
+                        * specify returning 0 for reads from all offsets
+                        * beyond EOF.
+                        */
+                       n = bp->b_bcount - bp->b_resid - on;
+                       if (n < 0) {
+                               brelse(bp);
+                               return (EINVAL);
+                       }
+ 
+                       n = imin(n, uio->uio_resid);
                        error = uiomove((char *)bp->b_data + on, n, uio);
                        brelse(bp);
---

I was also concerned about what happens when buffers with a short
b_bcount or a nonzero b_resid are left in the cache.  It seems that
nothing fatal happens, but they just get in the way.  I observed
the following farcial handling for rereading block 4 on /dev/vn0a
where /dev/vn0a has size 6:

    spec_read(): call bread()
      bread(): call getblk()
        getblk(): call gbincore()
          gb_incore(): returns the buffer successfully
        getblk(): The buffer has the wrong size, so it is thrown away.
                  Do a VOP_WRITE() as part of throwing it away, although
                  the buffer is not dirty! :-(
                  Return an empty buffer.
      bread(): do a VOP_STRATEGY() to read the buffer again.

Bruce



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Reply via email to