Hi all,
I'd like to report a kernel panic we encountered at a customer site running Lustre 2.15.6 (also reproduced on 2.17.0). The issue has been filed on Whamcloud Jira: LU-20227 --- Summary When a Lustre snapshot MDT is mounted in a DNE configuration, lod_statfs() iterates sub-MDT OSPs via lod_foreach_mdt. If a sub-MDT OSP returns rc=0 with an uninitialized opd_statfs cache (os_bsize=0), lod_statfs_sum() silently shifts sfs->os_bsize down to zero. This corrupted value is sent to the client with OS_STATFS_SUM set. When project quota is active and a block hard limit is set, ll_statfs_project() divides by sfs->f_bsize=0, triggering a CPU #DE (Divide Error) exception and kernel panic. The panic was triggered simply by running linux du command on a directory under the snapshot mount. --- Environment - Lustre: 2.15.6 / 2.17.0 (we currently use 2.15.6 and tested also 2.17.0) - Kernel: 4.18.0-553.5.1 - Backend FS: ZFS - Topology: MGS, MDT x3, OST x5, project quota enabled --- Reproduction Conditions (all five must hold) 1. Snapshot client mounted 2. Snapshot MDT has sub-MDTs (DNE configuration) 3. Sub-MDT OSP opd_statfs cache uninitialized (os_bsize=0) 4. Project quota enabled 5. Project block hard limit > 0 on the target directory --- Root Cause BUG 1 (server): lod_foreach_mdt lacks the os_bsize==0 guard that lod_foreach_ost already has. The uninitialized os_bsize silently corrupts sfs->os_bsize to zero. BUG 2 (client): ll_statfs_project() unconditionally divides by sfs->f_bsize without validating it is non-zero. --- Proposed Fix Server side (main fix): - if (rc) + if (rc || ost_sfs.os_bsize == 0) Client side (defensive): Add an f_bsize == 0 guard in ll_statfs_project() before the division, falling back to 1
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
