Hello everyone,

We've run into a pretty nasty LBUG that took our LFS down.  We're not exactly 
sure the cause and could use some help.  Its pretty much identical to this:

https://jira.whamcloud.com/browse/LU-13189

One of our OSS's started crashing repeated last night.  We are configured with 
HA and tried failing over to its pair just to have that OSS crash in the same 
way.  We are in the process of doing the same thing mentioned in the above LU 
to get back up and running but we'd like to try and fix this without the #undef 
ZFS_PROJINHERIT if possible.  A couple of months ago we updated our servers to 
2.14 – stock, no modifications – and we'd like to get back to stock 2.14 again 
if possible.  Up until last night, our experience with 2.14 was great – very 
stable compared to what we were running previously (very old 2.10) and better 
performing.  Our specific stack trace from the crash dump is below if that 
helps.  Our servers are running 3.10.0-1160.31.1.el7.x86_64.  MDT and OST's are 
both using ZFS (version 2.0).  

There are two things that could have contributed to the crash.  

First, about 1 week ago, we tried to use project quotas for the first time.  
Without reading the lustre manual, I just tried to set a project quota as such:


        lfs setquota -p 1 -b 307200 -B 309200 -i 10000 -I 11000 .


But it was pretty obvious that didn't work.


        # lfs quota -p 1 /nobackup/
        Unexpected quotactl error: Operation not supported
        Disk quotas for prj 1 (pid 1):
             Filesystem  kbytes   quota   limit   grace   files   quota   limit 
  grace
            /nobackup/     [0]     [0]     [0]       -     [0]     [0]     [0]  
     -
        Some errors happened when getting quota info. Some devices may be not 
working or deactivated. The data in "[]" is inaccurate.
        #


Then, after reading section 25.2 in the lustre manual 
(https://doc.lustre.org/lustre_manual.xhtml#enabling_disk_quotas), I saw that 
zfs version >=0.8 with kernel version < 4.5 requires a patched kernel.  So I 
just moved on figuring project quotas would not work since we are using the 
stock kernel.  But it now it appears this might be the cause of our problem.  
As of right now, I see this in the zfs properties for our metadata filesystem. 

        [root@hpfs-fsl-mds0 ~]# zpool get all mds0-0-new  | grep proj
        mds0-0-new  feature@project_quota          active                       
 local
        [root@hpfs-fsl-mds0 ~]#


Several questions come to mind.  

* Is this the likely cause of our crash?
* Why would it take 1 week to show up?
* What is the best way to reverse any ill effects the "lfs setquota -p 1" 
command may have caused?



The second possible contributor is related to some maintenance we just finished 
on the metadata server yesterday morning.  After the update to 2.14 (and zfs 
update from 0.7 to 2.0), we got this message from "zpool status" on our mdt 
pool:


  pool: mds0-0
 state: ONLINE
status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
action: Replace affected devices with devices that support the
        configured block size, or migrate data to a properly configured
        pool.
  scan: scrub repaired 0B in 1 days 17:49:23 with 0 errors on Fri Jul  9 
21:03:24 2021
config:

        NAME        STATE     READ WRITE CKSUM
        mds0-0      ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            mpathm  ONLINE       0     0     0  block size: 512B configured, 
4096B native
            mpathn  ONLINE       0     0     0  block size: 512B configured, 
4096B native
          mirror-1  ONLINE       0     0     0
            mpatho  ONLINE       0     0     0  block size: 512B configured, 
4096B native
            mpathp  ONLINE       0     0     0  block size: 512B configured, 
4096B native
          mirror-2  ONLINE       0     0     0
            mpathq  ONLINE       0     0     0  block size: 512B configured, 
4096B native
            mpathr  ONLINE       0     0     0  block size: 512B configured, 
4096B native
          mirror-3  ONLINE       0     0     0
            mpaths  ONLINE       0     0     0  block size: 512B configured, 
4096B native
            mpatht  ONLINE       0     0     0  block size: 512B configured, 
4096B native
          mirror-4  ONLINE       0     0     0
            mpathu  ONLINE       0     0     0  block size: 512B configured, 
4096B native
            mpathv  ONLINE       0     0     0  block size: 512B configured, 
4096B native
          mirror-5  ONLINE       0     0     0
            mpathw  ONLINE       0     0     0  block size: 512B configured, 
4096B native
            mpathx  ONLINE       0     0     0  block size: 512B configured, 
4096B native



This is related to the SSD's we are using for the MDT.  The physical block size 
is 4k (ashift=12) but the logical block size is 0.5k (ashift=9).  Apparently, 
the old version of zfs (under which the original pool was built) picked 
ashift=9 but after the update zfs 2.0 was telling us we should be using the 
larger block size to match the physical block size of these drives.  Despite 
this mismatch, our mdtest results (via io500) were greatly improved with the 
lustre 2.14 update.  But its still something we wanted to fix, which was the 
purpose of our maintenance outage yesterday.  So we backed up the 
mds0-0/meta-fsl file system to a separate pool, destroyed the old pool, rebuilt 
it (now with zfs choosing shift=12 for the block size) and copied the data back 
to the newly created pool.  However, this process failed.  Our old metadata 
file system (512B block size) was using about 490 GB of our 2.2 TB pool.  Due 
to the increase in block size, the data take up more space in the file system - 
potentially 8x more if each entry is less than 512 B to begin with.  We filled 
up the new ashift=12 pool.  So we had to revert back to an ashift=9 pool.   We 
are going to have buy more or bigger SSD's (or use raidz instead of raid10) if 
we want to go to a bigger ashift.  

So this could be related too.  Theoretically, nothing should have changed as 
far as lustre was concerned.  But its hard to ignore that we put the file 
system back in service yesterday morning and about 10 hours later we ran into 
this problem.  


If anyone has ideas, please let us know.  We're happy to post details here or 
to an LU.  

Thanks,
Darby Vicker




[  138.597710] LustreError: 2476:0:(tgt_grant.c:803:tgt_grant_check()) 
hpfs-fsl-OST0005: cli cd0fda1d-691d-bb4f-1548-c45f8c2e578d is replaying 
OST_WRITE while one rnb hasn't OBD_BRW_FROM_GRANT set (0x8)
[  138.699120] LustreError: 2476:0:(osd_object.c:1353:osd_attr_set()) 
ASSERTION( obj->oo_with_projid ) failed:
[  138.699155] LustreError: 2476:0:(osd_object.c:1353:osd_attr_set()) LBUG
[  138.699176] Pid: 2476, comm: tgt_recover_5 3.10.0-1160.31.1.el7.x86_64 #1 
SMP Thu Jun 10 13:32:12 UTC 2021
[  138.699177] Call Trace:
[  138.699184]  [<ffffffffc104167c>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[  138.699194]  [<ffffffffc104199c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[  138.699199]  [<ffffffffc17a62db>] osd_attr_set+0xdeb/0xe60 [osd_zfs]
[  138.699207]  [<ffffffffc18cf50e>] ofd_write_attr_set+0x87e/0xd20 [ofd]
[  138.699213]  [<ffffffffc18cfc03>] ofd_commitrw_write+0x253/0x1510 [ofd]
[  138.699218]  [<ffffffffc18d484d>] ofd_commitrw+0x2ad/0x9a0 [ofd]
[  138.699223]  [<ffffffffc15b85d1>] tgt_brw_write+0xe51/0x1a10 [ptlrpc]
[  138.699273]  [<ffffffffc15bca5a>] tgt_request_handle+0x7ea/0x1750 [ptlrpc]
[  138.699299]  [<ffffffffc150a096>] handle_recovery_req+0x96/0x290 [ptlrpc]
[  138.699317]  [<ffffffffc151406b>] 
replay_request_or_update.isra.25+0x2fb/0x930 [ptlrpc]
[  138.699336]  [<ffffffffc1514dbd>] target_recovery_thread+0x71d/0x11d0 
[ptlrpc]
[  138.699354]  [<ffffffffba6c5e31>] kthread+0xd1/0xe0
[  138.699357]  [<ffffffffbad95df7>] ret_from_fork_nospec_end+0x0/0x39
[  138.699360]  [<ffffffffffffffff>] 0xffffffffffffffff
[  138.699380] Kernel panic - not syncing: LBUG
[  138.699395] CPU: 1 PID: 2476 Comm: tgt_recover_5 Kdump: loaded Tainted: P    
       OE  ------------   3.10.0-1160.31.1.el7.x86_64 #1
[  138.699429] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
[  138.699449] Call Trace:
[  138.699460]  [<ffffffffbad835a9>] dump_stack+0x19/0x1b
[  138.699477]  [<ffffffffbad7d2b1>] panic+0xe8/0x21f
[  138.699496]  [<ffffffffc10419eb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[  138.699519]  [<ffffffffc17a62db>] osd_attr_set+0xdeb/0xe60 [osd_zfs]
[  138.699543]  [<ffffffffc18ca5cd>] ? ofd_attr_handle_id+0x12d/0x410 [ofd]
[  138.699566]  [<ffffffffc18cf50e>] ofd_write_attr_set+0x87e/0xd20 [ofd]
[  138.699588]  [<ffffffffba7de42d>] ? kzfree+0x2d/0x70
[  138.699607]  [<ffffffffc18cfc03>] ofd_commitrw_write+0x253/0x1510 [ofd]
[  138.699628]  [<ffffffffba7c7675>] ? __free_pages+0x25/0x30
[  138.699649]  [<ffffffffc18d484d>] ofd_commitrw+0x2ad/0x9a0 [ofd]
[  138.699693]  [<ffffffffc15b85d1>] tgt_brw_write+0xe51/0x1a10 [ptlrpc]
[  138.699738]  [<ffffffffc15bca5a>] tgt_request_handle+0x7ea/0x1750 [ptlrpc]
[  138.699761]  [<ffffffffba6aee98>] ? add_timer+0x18/0x20
[  138.699779]  [<ffffffffba6bc13b>] ? __queue_delayed_work+0x8b/0x1a0
[  138.699822]  [<ffffffffc15bc270>] ? tgt_hpreq_handler+0x2c0/0x2c0 [ptlrpc]
[  138.699861]  [<ffffffffc150a096>] handle_recovery_req+0x96/0x290 [ptlrpc]
[  138.699899]  [<ffffffffc151406b>] 
replay_request_or_update.isra.25+0x2fb/0x930 [ptlrpc]
[  138.699940]  [<ffffffffc1514dbd>] target_recovery_thread+0x71d/0x11d0 
[ptlrpc]
[  138.699963]  [<ffffffffbad88e60>] ? __schedule+0x320/0x680
[  138.699998]  [<ffffffffc15146a0>] ? 
replay_request_or_update.isra.25+0x930/0x930 [ptlrpc]
[  138.700023]  [<ffffffffba6c5e31>] kthread+0xd1/0xe0
[  138.700039]  [<ffffffffba6c5d60>] ? insert_kthread_work+0x40/0x40
[  138.700059]  [<ffffffffbad95df7>] ret_from_fork_nospec_begin+0x21/0x21
[  138.700079]  [<ffffffffba6c5d60>] ? insert_kthread_work+0x40/0x40

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
  • [... Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss
    • ... Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss

Reply via email to