We were able to get our LFS back up using the fix in LU-13189 and have been
stable since. But I'd still appreciate some help backing out of this.
* Is the "lfs setquota -p 1" the likely cause of our crash?
* If so:
* Why would it take 1 week to show up?
* What is the best way to reverse any ill effects the "lfs setquota -p
1" command may have caused?
* Should there be some protection in the lustre source for this?
-----Original Message-----
From: lustre-discuss <[email protected]> on behalf of
"Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss"
<[email protected]>
Reply-To: "Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.]"
<[email protected]>
Date: Thursday, September 30, 2021 at 11:41 AM
To: Colin Faber via lustre-discuss <[email protected]>
Subject: [EXTERNAL] [lustre-discuss] ASSERTION( obj->oo_with_projid ) failed
Hello everyone,
We've run into a pretty nasty LBUG that took our LFS down. We're not
exactly sure the cause and could use some help. Its pretty much identical to
this:
https://jira.whamcloud.com/browse/LU-13189
One of our OSS's started crashing repeated last night. We are configured
with HA and tried failing over to its pair just to have that OSS crash in the
same way. We are in the process of doing the same thing mentioned in the above
LU to get back up and running but we'd like to try and fix this without the
#undef ZFS_PROJINHERIT if possible. A couple of months ago we updated our
servers to 2.14 – stock, no modifications – and we'd like to get back to stock
2.14 again if possible. Up until last night, our experience with 2.14 was
great – very stable compared to what we were running previously (very old 2.10)
and better performing. Our specific stack trace from the crash dump is below
if that helps. Our servers are running 3.10.0-1160.31.1.el7.x86_64. MDT and
OST's are both using ZFS (version 2.0).
There are two things that could have contributed to the crash.
First, about 1 week ago, we tried to use project quotas for the first time.
Without reading the lustre manual, I just tried to set a project quota as such:
lfs setquota -p 1 -b 307200 -B 309200 -i 10000 -I 11000 .
But it was pretty obvious that didn't work.
# lfs quota -p 1 /nobackup/
Unexpected quotactl error: Operation not supported
Disk quotas for prj 1 (pid 1):
Filesystem kbytes quota limit grace files quota limit
grace
/nobackup/ [0] [0] [0] - [0] [0] [0]
-
Some errors happened when getting quota info. Some devices may be not
working or deactivated. The data in "[]" is inaccurate.
#
Then, after reading section 25.2 in the lustre manual
(https://doc.lustre.org/lustre_manual.xhtml#enabling_disk_quotas), I saw that
zfs version >=0.8 with kernel version < 4.5 requires a patched kernel. So I
just moved on figuring project quotas would not work since we are using the
stock kernel. But it now it appears this might be the cause of our problem.
As of right now, I see this in the zfs properties for our metadata filesystem.
[root@hpfs-fsl-mds0 ~]# zpool get all mds0-0-new | grep proj
mds0-0-new feature@project_quota active
local
[root@hpfs-fsl-mds0 ~]#
Several questions come to mind.
* Is this the likely cause of our crash?
* Why would it take 1 week to show up?
* What is the best way to reverse any ill effects the "lfs setquota -p 1"
command may have caused?
The second possible contributor is related to some maintenance we just
finished on the metadata server yesterday morning. After the update to 2.14
(and zfs update from 0.7 to 2.0), we got this message from "zpool status" on
our mdt pool:
pool: mds0-0
state: ONLINE
status: One or more devices are configured to use a non-native block size.
Expect reduced performance.
action: Replace affected devices with devices that support the
configured block size, or migrate data to a properly configured
pool.
scan: scrub repaired 0B in 1 days 17:49:23 with 0 errors on Fri Jul 9
21:03:24 2021
config:
NAME STATE READ WRITE CKSUM
mds0-0 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
mpathm ONLINE 0 0 0 block size: 512B configured,
4096B native
mpathn ONLINE 0 0 0 block size: 512B configured,
4096B native
mirror-1 ONLINE 0 0 0
mpatho ONLINE 0 0 0 block size: 512B configured,
4096B native
mpathp ONLINE 0 0 0 block size: 512B configured,
4096B native
mirror-2 ONLINE 0 0 0
mpathq ONLINE 0 0 0 block size: 512B configured,
4096B native
mpathr ONLINE 0 0 0 block size: 512B configured,
4096B native
mirror-3 ONLINE 0 0 0
mpaths ONLINE 0 0 0 block size: 512B configured,
4096B native
mpatht ONLINE 0 0 0 block size: 512B configured,
4096B native
mirror-4 ONLINE 0 0 0
mpathu ONLINE 0 0 0 block size: 512B configured,
4096B native
mpathv ONLINE 0 0 0 block size: 512B configured,
4096B native
mirror-5 ONLINE 0 0 0
mpathw ONLINE 0 0 0 block size: 512B configured,
4096B native
mpathx ONLINE 0 0 0 block size: 512B configured,
4096B native
This is related to the SSD's we are using for the MDT. The physical block
size is 4k (ashift=12) but the logical block size is 0.5k (ashift=9).
Apparently, the old version of zfs (under which the original pool was built)
picked ashift=9 but after the update zfs 2.0 was telling us we should be using
the larger block size to match the physical block size of these drives.
Despite this mismatch, our mdtest results (via io500) were greatly improved
with the lustre 2.14 update. But its still something we wanted to fix, which
was the purpose of our maintenance outage yesterday. So we backed up the
mds0-0/meta-fsl file system to a separate pool, destroyed the old pool, rebuilt
it (now with zfs choosing shift=12 for the block size) and copied the data back
to the newly created pool. However, this process failed. Our old metadata
file system (512B block size) was using about 490 GB of our 2.2 TB pool. Due
to the increase in block size, the data take up more space in the file system -
potentially 8x more if each entry is less than 512 B to begin with. We filled
up the new ashift=12 pool. So we had to revert back to an ashift=9 pool. We
are going to have buy more or bigger SSD's (or use raidz instead of raid10) if
we want to go to a bigger ashift.
So this could be related too. Theoretically, nothing should have changed
as far as lustre was concerned. But its hard to ignore that we put the file
system back in service yesterday morning and about 10 hours later we ran into
this problem.
If anyone has ideas, please let us know. We're happy to post details here
or to an LU.
Thanks,
Darby Vicker
[ 138.597710] LustreError: 2476:0:(tgt_grant.c:803:tgt_grant_check())
hpfs-fsl-OST0005: cli cd0fda1d-691d-bb4f-1548-c45f8c2e578d is replaying
OST_WRITE while one rnb hasn't OBD_BRW_FROM_GRANT set (0x8)
[ 138.699120] LustreError: 2476:0:(osd_object.c:1353:osd_attr_set())
ASSERTION( obj->oo_with_projid ) failed:
[ 138.699155] LustreError: 2476:0:(osd_object.c:1353:osd_attr_set()) LBUG
[ 138.699176] Pid: 2476, comm: tgt_recover_5 3.10.0-1160.31.1.el7.x86_64
#1 SMP Thu Jun 10 13:32:12 UTC 2021
[ 138.699177] Call Trace:
[ 138.699184] [<ffffffffc104167c>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 138.699194] [<ffffffffc104199c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 138.699199] [<ffffffffc17a62db>] osd_attr_set+0xdeb/0xe60 [osd_zfs]
[ 138.699207] [<ffffffffc18cf50e>] ofd_write_attr_set+0x87e/0xd20 [ofd]
[ 138.699213] [<ffffffffc18cfc03>] ofd_commitrw_write+0x253/0x1510 [ofd]
[ 138.699218] [<ffffffffc18d484d>] ofd_commitrw+0x2ad/0x9a0 [ofd]
[ 138.699223] [<ffffffffc15b85d1>] tgt_brw_write+0xe51/0x1a10 [ptlrpc]
[ 138.699273] [<ffffffffc15bca5a>] tgt_request_handle+0x7ea/0x1750
[ptlrpc]
[ 138.699299] [<ffffffffc150a096>] handle_recovery_req+0x96/0x290 [ptlrpc]
[ 138.699317] [<ffffffffc151406b>]
replay_request_or_update.isra.25+0x2fb/0x930 [ptlrpc]
[ 138.699336] [<ffffffffc1514dbd>] target_recovery_thread+0x71d/0x11d0
[ptlrpc]
[ 138.699354] [<ffffffffba6c5e31>] kthread+0xd1/0xe0
[ 138.699357] [<ffffffffbad95df7>] ret_from_fork_nospec_end+0x0/0x39
[ 138.699360] [<ffffffffffffffff>] 0xffffffffffffffff
[ 138.699380] Kernel panic - not syncing: LBUG
[ 138.699395] CPU: 1 PID: 2476 Comm: tgt_recover_5 Kdump: loaded Tainted:
P OE ------------ 3.10.0-1160.31.1.el7.x86_64 #1
[ 138.699429] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
[ 138.699449] Call Trace:
[ 138.699460] [<ffffffffbad835a9>] dump_stack+0x19/0x1b
[ 138.699477] [<ffffffffbad7d2b1>] panic+0xe8/0x21f
[ 138.699496] [<ffffffffc10419eb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[ 138.699519] [<ffffffffc17a62db>] osd_attr_set+0xdeb/0xe60 [osd_zfs]
[ 138.699543] [<ffffffffc18ca5cd>] ? ofd_attr_handle_id+0x12d/0x410 [ofd]
[ 138.699566] [<ffffffffc18cf50e>] ofd_write_attr_set+0x87e/0xd20 [ofd]
[ 138.699588] [<ffffffffba7de42d>] ? kzfree+0x2d/0x70
[ 138.699607] [<ffffffffc18cfc03>] ofd_commitrw_write+0x253/0x1510 [ofd]
[ 138.699628] [<ffffffffba7c7675>] ? __free_pages+0x25/0x30
[ 138.699649] [<ffffffffc18d484d>] ofd_commitrw+0x2ad/0x9a0 [ofd]
[ 138.699693] [<ffffffffc15b85d1>] tgt_brw_write+0xe51/0x1a10 [ptlrpc]
[ 138.699738] [<ffffffffc15bca5a>] tgt_request_handle+0x7ea/0x1750
[ptlrpc]
[ 138.699761] [<ffffffffba6aee98>] ? add_timer+0x18/0x20
[ 138.699779] [<ffffffffba6bc13b>] ? __queue_delayed_work+0x8b/0x1a0
[ 138.699822] [<ffffffffc15bc270>] ? tgt_hpreq_handler+0x2c0/0x2c0
[ptlrpc]
[ 138.699861] [<ffffffffc150a096>] handle_recovery_req+0x96/0x290 [ptlrpc]
[ 138.699899] [<ffffffffc151406b>]
replay_request_or_update.isra.25+0x2fb/0x930 [ptlrpc]
[ 138.699940] [<ffffffffc1514dbd>] target_recovery_thread+0x71d/0x11d0
[ptlrpc]
[ 138.699963] [<ffffffffbad88e60>] ? __schedule+0x320/0x680
[ 138.699998] [<ffffffffc15146a0>] ?
replay_request_or_update.isra.25+0x930/0x930 [ptlrpc]
[ 138.700023] [<ffffffffba6c5e31>] kthread+0xd1/0xe0
[ 138.700039] [<ffffffffba6c5d60>] ? insert_kthread_work+0x40/0x40
[ 138.700059] [<ffffffffbad95df7>] ret_from_fork_nospec_begin+0x21/0x21
[ 138.700079] [<ffffffffba6c5d60>] ? insert_kthread_work+0x40/0x40
_______________________________________________
lustre-discuss mailing list
[email protected]
https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=04%7C01%7Cdarby.vicker-1%40nasa.gov%7C84f0c8414146473a79d308d984398304%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637686204818487883%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pSEabqH75rxqYLWIKI0JrRjiTMtQ5BhoFuyQKFQsGL8%3D&reserved=0
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org