Re: [lustre-discuss] Inquiry Regarding LBUG Issue in Lustre 2.15.5 with RoCE and Sanity Test Coverage

Andreas Dilger via lustre-discuss Fri, 11 Apr 2025 01:46:44 -0700

On Mar 26, 2025, at 18:28, 권세훈 via lustre-discuss 
<[email protected]> wrote:
> 
> My name is Sehoon Kwon, and I’m a developer at Gluesys, a storage solution 
> provider based in South Korea.



Hello.

> We are currently working with Lustre version 2.15.5, and during testing in a 
> RoCE environment, we encountered an LBUG issue. Upon checking the community 
> issue tracker (LU-16637), we confirmed that a similar issue had been reported 
> and resolved in a later release (Lustre 2.16).
> We also noted that there had been an effort to backport the fix to the b2_15 
> branch. However, based on our investigation, it appears that the patch has 
> not yet been merged. As the stability of the fix remains unverified in this 
> branch, we are preparing to evaluate the patch internally, referring to the 
> Mallo-based testing you conducted as a reference.
> 
> We have backported the commit addressing LU-16637 to our ZFS-based Lustre 
> 2.15.5 environment and successfully completed the build process, along with 
> several other fixes.
> Following the Testing HOWTO on the Lustre Wiki, we executed sanity.sh and 
> observed that the script includes nearly 1000 test cases. However, in some 
> shared test logs from Whamcloud, we noticed that only around 300 tests were 
> actually run.
> 
> We would appreciate your clarification on the following points:
>     • Are there any default test sets or predefined exclusions when running 
> sanity.sh?
>  Alternatively, does Whamcloud maintain an internal list of commonly executed 
> tests?

The number of subtests that are run depends on the configuration.  It will 
print a message for each subtest that is not run, for example because it 
depends on a newer server version, or two or more MDTs or OSTs, missing tools, 
etc.

>     • For the 2.15 branch, is there any recommended test suite or guideline 
> for verifying backported patches?

The tests that should be run depend on what the patch is changing.  We run 
nearly all of the tests for every patch (about 150h of tests with different 
configurations, kernels, features, etc.), unless the patch is not changing any 
functional code and is marked  "trivial" so it only runs about 6-8h of testing 
(sanity, sanity-lnet).

>     • In addition to the sanity suite, we are aware of several other test 
> categories.
>  If there is a commonly used baseline set for general validation, your 
> guidance would be greatly appreciated.
> We aim to align our testing with community standards and ensure compatibility 
> and stability, so any information or reference materials you could provide 
> would be of great help.

Nearly all of the tests run in review testing will pass.  However, given the 
distributed nature of the filesystem and running in VMs, there are some 
subtests that fail intermittently.  It should be possible to re-run the failed 
tests to have them pass.

You are welcome to push the backported patch to the b2_15 branch of the 
fs/lustre repository in Gerrit.  Please follow the submission guidelines:
https://wiki.lustre.org/Submitting_Changes
https://wiki.lustre.org/Using_Gerrit
https://wiki.lustre.org/Commit_Comments

Since this is a backported patch, please add the following labels to indicate 
this is backported from the master branch
(see any backported patch on the b2_15 branch will have these labels.):

Lustre-change: https://jira.whamcloud.com/browse/LU-nnnnn
Lustre-commit: {git commit hash of master patch}

and remove the existing "Reviewed-on:" , "Reviewed-by: Oleg Drokin", and 
"Tested-by:" labels from the patch.

Cheers, Andreas
—
Andreas Dilger
Lustre Principal Architect
Whamcloud/DDN




_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Inquiry Regarding LBUG Issue in Lustre 2.15.5 with RoCE and Sanity Test Coverage

Reply via email to