Hello Daniel,

You should also check if there is not some user workload that is triggering 
that load, like a constant load of SYNC to files on those OSTs by example.

Aurélien

Le 11/01/2023 22:37, « lustre-discuss au nom de Daniel Szkola via 
lustre-discuss » <[email protected] au nom de 
[email protected]> a écrit :

    CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



    We recently had to take an OSS node that hosts two OSTs out of service to 
test the hardware as it was randomly power cycling.

    I migrated all files off of the two OSTs and after some testing we brought 
the node back into service after recreating the ZFS pools
    and the two OSTs. Since then it’s been mostly working fine, however we’ve 
noticed a few group quotas reporting file usage that doesn’t
    seem to match what is actually on the filesystem. The inode counts seem to 
be correct, but the space used is way too high.

    After lots of poking around I am seeing this on the two OSTS:

    osp.lfsc-OST0004-osc-MDT0000.sync_changes=13802381
    osp.lfsc-OST0005-osc-MDT0000.sync_changes=13060667

    I upped the max_rpcs_in_progress and max_rpcs_in_flight for the two OSTs, 
but that just caused the numbers to dip slightly.
    All other OSTs have 0 for that value. Also destroys_in_flight show similar 
numbers for the two OSTs.

    Any ideas how I can remedy this?

    Lustre 2.12.8
    ZFS 0.7.13

    —
    Dan Szkola
    FNAL





    _______________________________________________
    lustre-discuss mailing list
    [email protected]
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to