Hi Angelos,

Bug reports could be made at  https://jira.whamcloud.com/


Aurélien

Le 04/09/2020 06:11, « lustre-discuss au nom de Angelos Ching » 
<lustre-discuss-boun...@lists.lustre.org au nom de 
angelosch...@clustertech.com> a écrit :

    CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



    Dear all,

    I think I've encountered a bug in lnetctl but not sure where to submit a
    bug report:

    Summary:
    It's expected that the Lnet config on a node can be recreated on
    lnet.service start up by saving the config using: lnetctl export
    --backup > /etc/lnet.conf
    But ordering within ymal file causes extraneous NIDs to be created when
    used in combination with routing, thus breaking Lnet routing / node
    communication, with server side dmesg showing "Bad dest nid n.n.n.n@o2ib
    (it's my nid but on a different network)"

    Environment:
    Client: CentOS 7.8, Lustre 2.12.5-ib, MLNX OFED 4.9-0.1.7.1
    Lnet router + server: CentOS 7.7, Lustre 2.12.4-ib, MLNX OFED 4.7-3.2.9.0

    Steps to reproduce:
    (Listing 1) Server side Lnet config (peer list omitted for conciseness):
    https://pastebin.com/DH6HAt5a
    (Listing 2) Full command listing and output on client side is reproduced
    here: https://pastebin.com/h3wHyCM7

    All steps below carried out on Lustre client:

    1. Restart lnet service with empty /etc/lnet.conf
    2. lnetctl net add: TCP network using Ethernet
    3. lnetctl peer add: 2 peers with "Lnet router + server"@o2ib,tcp NIDs
    4. lnetctl route add: 2 gateways to o2ib network using "Lnet router +
    server"@TCP NID
    5. lnetctl export: with --backup to /etc/lnet.conf; check the saved file
    and confirm Lnet is configured with 2 peers and 2 gateways (Listing 2:
    37-47)
    6. Mount o2ib exported Lustre volume and confirm volume functioning
    correctly; unmount volume
    7. Restart lnet.service and check lnet configuration; finds 2 extra peer
    entries that reference only TCP NID of the "Lnet router + server" along
    with 2 manually configured peers that reference both o2ib and tcp NIDs
    (Listing 2: 75-93)
    8. Client fails to mount o2ib exported volume; server side kernel
    message shows "Bad dest nid n.n.n.n@o2ib (it's my nid but on a different
    network)"

    9. If we reorder the peer list to go before the route list in
    /etc/lnet.conf (Listing 2: 16), then lnet would be properly configured
    with 2 peers on service restart and everything works as expected.

    Best regards,

    --
    Angelos Ching
    ClusterTech Limited

    Tel     : +852-2655-6138
    Fax     : +852-2994-2101
    Address : Unit 211-213, Lakeside 1, 8 Science Park West Ave., Shatin, Hong 
Kong

    Got praises or room for improvements? http://bit.ly/TellAngelos

    
********************************************************************************
    The information contained in this e-mail and its attachments is 
confidential and
    intended solely for the specified addressees. If you have received this 
email in
    error, please do not read, copy, distribute, disclose or use any 
information of
    this email in any way and please immediately notify the sender and delete 
this
    email. Thank you for your cooperation.
    
********************************************************************************

    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss@lists.lustre.org
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to