Hi Chris, Thanks for your help. I have collected the relevant logs according to your hints. But I need an account to open a ticket on Jira. I have sent an email to the administrator at [email protected]. I was wondering if this is the correct way to apply for an account. I only found this email on the site.
Regards, Chuanjun Horn, Chris <[email protected]> 于2023年2月18日周六 00:52写道: > If deleting and re-adding it restores the status to up then this sounds > like a bug to me. > > > > Can you enable debug tracing, reproduce the issue, and add this > information to a ticket? > > To enable/gather debug: > > # lctl set_param debug=+net > <reproduce issue> > # lctl dk > /tmp/dk.log > > You can create a ticket at https://jira.whamcloud.com/ > > Please provide the dk.log with the ticket. > > > > Thanks, > > Chris Horn > > > > *From: *lustre-discuss <[email protected]> on > behalf of 腐朽银 via lustre-discuss <[email protected]> > *Date: *Friday, February 17, 2023 at 2:53 AM > *To: *[email protected] <[email protected]> > *Subject: *[lustre-discuss] LNet nid down after some thing changed the > NICs > > Hi, > > > > I encountered a problem when using Lustre Client on k8s with kubenet. Very > happy if you could help me. > > > > My LNet configuration is: > > > > net: > - net type: lo > local NI(s): > - nid: 0@lo > status: up > - net type: tcp > local NI(s): > - nid: 10.224.0.5@tcp > status: up > interfaces: > 0: eth0 > > > > It works. But after I deploy or delete a pod on the node. The nid goes > down like: > > > > - nid: 10.224.0.5@tcp > status: down > interfaces: > 0: eth0 > > > > k8s uses veth pairs, so it will add or delete network interfaces when > deploying or deleting pods. But it doesn't touch the eth0 NIC. I can fix it > by deleting the tcp net by `lnetctl net del` and re-add it by `lnetctl net > add`. But I need to do this every time after a pod is scheduled to this > node. > > > > My node OS is Ubuntu 18.04 5.4.0-1101-azure. The Lustre Client is built by > myself from 2.15.1. Is this an expected LNet behavior or I got something > wrong? I re-build and tested it several times and got the same problem. > > > > Regards, > > Chuanjun >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
