On 9/30/15 6:47 AM, Brian Foster wrote:
> On Tue, Sep 29, 2015 at 08:03:31PM -0500, Eric Sandeen wrote:
>> This tests bfoster's
>> [PATCH 1/2] xfs: always drain dio before extending aio write submission
>> patch; it launches four adjacent 1k IOs past EOF, then reads back
>> to see if we have 4k worth of the data we wrote, or something else -
>> possibly zeros from sub-block zeroing and eof racing.
>>
>> Signed-off-by: Eric Sandeen <[email protected]>
>> ---
>>
>
> Thanks for the test! A few high level comments...
>
>>
>> diff --git a/src/aio-dio-regress/aio-dio-eof-race.c
>> b/src/aio-dio-regress/aio-dio-eof-race.c
>> new file mode 100644
>> index 0000000..c1ce695
>> --- /dev/null
>> +++ b/src/aio-dio-regress/aio-dio-eof-race.c
>> @@ -0,0 +1,170 @@
>> +/*
>> + * Launch 4 file-extending extending sub-block AIOs and ensure that we
>> + * don't see data corruption when they're all complete.
>> + *
>
> It might be good to point out that this stresses EOF zeroing, which also
> means a key point is not just file-extending I/O, but file-extending I/O
> beyond current EOF (e.g., I/O that creates holes between the current EOF
> and start of the new I/O).
yeah, ok, good to be specific.
>> + * Copyright (C) 2015 Red Hat, Inc. All Rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307,
>> USA.
>> + */
>> +#include <sys/stat.h>
>> +#include <sys/types.h>
>> +#include <errno.h>
>> +#include <fcntl.h>
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +#include <unistd.h>
>> +#include <ctype.h>
>> +
>> +#include <libaio.h>
>> +
>> +#define BUF_SIZE 4096
>
> Have you considered making the buffer size variable? As is currently
> written, I don't think this will reproduce the problem on 512b or 1024b
> fsb filesystems because at least some of the file-extending I/Os must
> not be block aligned for it to trigger.
yeah, I meant to make it smaller, then got distracted, tbh ;
But I think launching 4x512 IOs should always do the trick, right?
And it's impossible to repro on a 512b fs block I think, because
we can't have any sub-block/unaligned IOs?
So I think that perhaps switching it to 2048 & doing 4x512 IOs
would suffice, right?
>> +#define IO_PATTERN 0xab
>> +
>> +void
>> +dump_buffer(
>> + void *buf,
>> + off64_t offset,
>> + ssize_t len)
>> +{
>> + int i, j;
>> + char *p;
>> + int new;
>> +
>> + for (i = 0, p = (char *)buf; i < len; i += 16) {
>> + char *s = p;
>> +
>> + if (i && !memcmp(p, p - 16, 16)) {
>> + new = 0;
>> + } else {
>> + if (i)
>> + printf("*\n");
>> + new = 1;
>> + }
>> +
>> + if (!new) {
>> + p += 16;
>> + continue;
>> + }
>> +
>> + printf("%08llx ", (unsigned long long)offset + i);
>> + for (j = 0; j < 16 && i + j < len; j++, p++)
>> + printf("%02x ", *p);
>> + printf(" ");
>> + for (j = 0; j < 16 && i + j < len; j++, s++) {
>> + if (isalnum((int)*s))
>> + printf("%c", *s);
>> + else
>> + printf(".");
>> + }
>> + printf("\n");
>> +
>> + }
>> + printf("%08llx\n", (unsigned long long)offset + i);
>> +}
>> +
>> +int main(int argc, char *argv[])
>> +{
>> + struct io_context *ctx = NULL;
>> + struct io_event evs[4];
>> + struct iocb iocb1, iocb2, iocb3, iocb4;
>> + struct iocb *iocbs[] = { &iocb1, &iocb2, &iocb3, &iocb4 };
>> + void *buf;
>> + struct stat statbuf;
>> + char cmp_buf[BUF_SIZE];
>> + int fd, err = 0;
>> + off_t eof;
>> +
>> + fd = open(argv[1], O_DIRECT | O_CREAT | O_TRUNC | O_RDWR, 0600);
>> + if (fd == -1) {
>> + perror("open");
>> + return 1;
>> + }
>> +
>> + err = posix_memalign(&buf, BUF_SIZE, BUF_SIZE);
>
> The alignment should technically be PAGE_SIZE (even though BUF_SIZE ==
> 4096), right?
hm, tbh this just came along from the other AIO tests I copied from ;)
Sure, I can do it page-sized, that'd be better.
>> + if (err) {
>> + fprintf(stderr, "error %s during %s\n",
>> + strerror(err),
>> + "posix_memalign");
>> + return 1;
>> + }
>> + memset(cmp_buf, IO_PATTERN, BUF_SIZE);
>> +
>> + err = io_setup(4, &ctx);
>> + if (err) {
>> + fprintf(stderr, "error %s during %s\n",
>> + strerror(err),
>> + "io_setup");
>> + return 1;
>> + }
>> +
>> + eof = 0;
>> +
>> + /* Keep extending until 8MB */
>> + while (eof < 8 * 1024 * 1024) {
>> + memset(buf, IO_PATTERN, BUF_SIZE);
>> +
>> + fstat(fd, &statbuf);
>> + eof = statbuf.st_size;
>> +
>> + /*
>> + * 4 ios, racing to extend EOF, combined they write full
>> BUF_SIZE
>> + */
>
> I would expand on this comment with something like the following:
>
> "Split the buffer into multiple I/Os to send a mix of block
> aligned/unaligned writes as well as writes that start beyond the current
> EOF. This stresses things like inode size management and stale block
> zeroing for races and can lead to data corruption when not handled
> properly."
>
> ... but feel free to reword that as necessary. I just wanted to point
> out that 1.) unaligned I/O is required and 2.) write offsets beyond EOF
> are required.
ok
>> + io_prep_pwrite(&iocb1, fd, buf, BUF_SIZE/4, eof + 0 *
>> BUF_SIZE/4);
>> + io_prep_pwrite(&iocb2, fd, buf, BUF_SIZE/4, eof + 1 *
>> BUF_SIZE/4);
>> + io_prep_pwrite(&iocb3, fd, buf, BUF_SIZE/4, eof + 2 *
>> BUF_SIZE/4);
>> + io_prep_pwrite(&iocb4, fd, buf, BUF_SIZE/4, eof + 3 *
>> BUF_SIZE/4);
>> +
>> + err = io_submit(ctx, 4, iocbs);
>> + if (err != 4) {
>> + fprintf(stderr, "error %s during %s\n",
>> + strerror(err),
>> + "io_submit");
>> + return 1;
>> + }
>> +
>> + err = io_getevents(ctx, 4, 4, evs, NULL);
>> + if (err != 4) {
>> + fprintf(stderr, "error %s during %s\n",
>> + strerror(err),
>> + "io_getevents");
>> + return 1;
>> + }
>> +
>> + /*
>> + * And then read it back.
>> + *
>> + * Using pread to keep it simple, but AIO has the same effect.
>> + *
>> + * eof is the old eof, we just wrote BUF_SIZE more
>> + */
>> + if (pread(fd, buf, BUF_SIZE, eof) != BUF_SIZE) {
>> + perror("pread");
>> + return 1;
>> + }
>> +
>> + /*
>> + * We launched 4 AIOs which, stiched together, should write
>
> Nit: stitched
doh ;)
>> + * a seamless BUF_SIZE worth of IO_PATTERN to the last block
>> + */
>> + if (memcmp(buf, cmp_buf, BUF_SIZE)) {
>> + printf("corruption while extending from %ld\n", eof);
>> + dump_buffer(buf, 0, BUF_SIZE);
>> + return 1;
>> + }
>> + }
>> +
>> + printf("Success, all done.\n");
>> + return 0;
>> +}
>> diff --git a/tests/generic/326 b/tests/generic/326
>> new file mode 100755
>> index 0000000..7db04ac
>> --- /dev/null
>> +++ b/tests/generic/326
>> @@ -0,0 +1,64 @@
>> +#! /bin/bash
>> +# FS QA Test No. 326
>> +#
>> +# Test races while extending past EOF via sub-block AIO writes
>> +#
>> +#-----------------------------------------------------------------------
>> +# Copyright (c) 2015 Red Hat, Inc. All Rights Reserved.
>> +#
>> +# This program is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU General Public License as
>> +# published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it would be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write the Free Software Foundation,
>> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
>> +#-----------------------------------------------------------------------
>> +#
>> +
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +tmp=/tmp/$$
>> +status=1 # failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> + cd /
>> + rm -f $TEST_DIR/tst-aio-dio-eof-race
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +
>> +_supported_fs generic
>> +_supported_os Linux
>> +
>> +_require_test
>> +_require_sparse_files
>> +_require_aiodio aio-dio-eof-race
>> +
>> +# Test does 512 byte DIO, so make sure that'll work
>> +logical_block_size=`_min_dio_alignment $TEST_DEV`
>> +
>> +if [ "$logical_block_size" -gt "512" ]; then
>> + _notrun "device block size: $logical_block_size greater than 512"
>> +fi
>> +
>
> Technically the test splits a 4k buffer into four parts and so sends 1k
> dio. Not that it really matters here, but I guess I'd update the
> comments. :)
>
>> +# If 512 isn't a sub-block IO, the test should still pass, so
>> +# let that go.
>> +
>
> Ok, so this at least points out the limitation to the test. I think it
> would be slightly better if the test were configurable as mentioned in
> the first comment above. For example, the test program could take a
> block size parameter and "do the right thing" based on that.
> Alternatively, we could specify buffer size and iocb count as parameters
> and let the xfstests script send the right params based on the test fs.
> I guess the latter would complicate things slightly because the program
> probably wants to ensure full buffers are written in each io_submit()
> instance for verification purposes.
>
> Just some thoughts... we could leave it as is too. In that case I would
> suggest to expand the comment above to be specific about which block
> sizes (512b, 1k) do not result in unaligned I/O.
ok, I'll make it a bit more universal one way or the other.
Thanks for the review,
-Eric
> Brian
>
>> +# This test does several extending loops internally
>> +$AIO_TEST $TEST_DIR/tst-aio-dio-eof-race
>> +
>> +status=$?
>> +exit
>> diff --git a/tests/generic/326.out b/tests/generic/326.out
>> new file mode 100644
>> index 0000000..22a3e78
>> --- /dev/null
>> +++ b/tests/generic/326.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 326
>> +Success, all done.
>> diff --git a/tests/generic/group b/tests/generic/group
>> index 4ae256f..a5f3008 100644
>> --- a/tests/generic/group
>> +++ b/tests/generic/group
>> @@ -207,3 +207,4 @@
>> 323 auto aio stress
>> 324 auto fsr quick
>> 325 auto quick data log
>> +326 auto quick aio
>>
>>
>> _______________________________________________
>> xfs mailing list
>> [email protected]
>> http://oss.sgi.com/mailman/listinfo/xfs
>
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html