Please don't reply to lustre-devel. Instead, comment in Bugzilla by using the 
following link:
https://bugzilla.lustre.org/show_bug.cgi?id=12326



Sanity test 78 now fails silently for me and fills up an OST, causing later
failures.  The root cause of this for me is likely that I recently increased the
amount of memory available in my test environment, so now it's trying to write a
much larger file than before.

== test 78: handle large O_DIRECT writes correctly ============= 13:37:25
(1177349845)
directio on /mnt/lustre/f78 for 125x1048576 bytes
Write error Success (rc = 89128960, len = 131072000)
lustre.fail_loc = 0
sanity.sh: FAIL: test_78 exit with rc=1
Debug log: 24266 lines, 24266 kept, 0 dropped.
lustre.fail_loc = 0
PASS (5s)

Here we can see the undetected failure.  The write actually failed but the test
shows as "PASS".  But then later, other tests fail and:

# lfs df /mnt/lustre
UUID                 1K-blocks      Used Available  Use% Mounted on
lustre-MDT0000_UUID      34984      8344     26640   23% /mnt/lustre[MDT:0]
lustre-OST0000_UUID      46856     45936       920   98% /mnt/lustre[OST:0]
lustre-OST0001_UUID      46856     46856         0  100% /mnt/lustre[OST:1]
lustre-OST0002_UUID      46856     44480      2376   94% /mnt/lustre[OST:2]

filesystem summary:     140568    137272      3296   97% /mnt/lustre

I have a full OST, which was why the write in test 78 failed.

So to fix, I suggest:

1. Detecting write failures and failing the test.
2. Writing a file small enough not to fill up any OST.
3. Unlinking the file at the end of the test.

I may find time to work on this issue this week but I'm not promising anything
so I'll leave it unassigned.

_______________________________________________
Lustre-devel mailing list
Lustre-devel@clusterfs.com
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to