the workstations are indeed running nfs, and i'm well aware of the nfs conflict. we don't run without privileged ports, and on occasion i hit 988 being taken by NFS. but that's readily identifiable in the netstat tables.
i'll look into the LAST_ID article mentioned above. looks painful, but at least it's a lead. the instigator seems to be as we're rolling rhel 7.4 to 7.5. the machine i was testing last week got rebooted over the weekend, so i'll have to wait till another gets upgraded and the error triggers to test out the theories On Fri, Jun 8, 2018 at 1:02 PM, Colin Faber <[email protected]> wrote: > If this system is also running NFS services (rpcbind for instance) you'll > want to start it without privileged ports. (-s if I recall) to avoid it > randomly selecting 988. > > On Fri, Jun 8, 2018 at 9:37 AM, Chad DeWitt <[email protected]> wrote: >> >> Maybe this applies in your situation? >> >> >> https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#idm140687082747200 >> >> ------------------------------------------------------------ >> >> Chad DeWitt, CISSP >> >> UNC Charlotte | ITS – University Research Computing >> >> [email protected] | www.uncc.edu >> >> ------------------------------------------------------------ >> >> >> On Fri, Jun 8, 2018 at 11:33 AM, Ben Evans <[email protected]> wrote: >>> >>> I've found that doing "modprobe lustre" until it succeeds works, but >>> that's just on my own dev VMs >>> >>> -Ben Evans >>> >>> On 6/8/18, 11:17 AM, "lustre-discuss on behalf of Michael Di Domenico" >>> <[email protected] on behalf of >>> [email protected]> wrote: >>> >>> >i'm having trouble with 2.10.4 clients running on rhel 7.5 kernel >>> > 862.3.2 >>> > >>> >at times when the box boots lustre wont mount, lnet bops out and >>> >complains about port 988 being in use >>> > >>> >however, when i run netstat or lsof commands, i cannot find port 988 >>> >listed against anything >>> > >>> >is there some way to trace deeper to see what lnet is really complaining >>> >about >>> > >>> >usually rebooting the box fixes the issue, but this seems a little >>> >mysterious >>> >_______________________________________________ >>> >lustre-discuss mailing list >>> >[email protected] >>> >http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >>> >>> _______________________________________________ >>> lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> >> >> >> _______________________________________________ >> lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> > > > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
