Hi, Thanks for your suggestion. But , to reboot the OSSs in production under massive IO pressure will make another long long story .
Regards. Weiss, Karsten <karsten.we...@atos.net> 于2020年7月30日周四 下午11:31写道: > Hi! > > > > (Caveat: I ran into this issue not on Lustre but on HPC MPI jobs on CentOS > 7.7. They only run stable > > with the workaround.) > > > > I’ve opened a bug with Red Hat at > https://bugzilla.redhat.com/show_bug.cgi?id=1796825 but unfortunately, > > it is no longer public (or fixed/closed) i.e. you probably won’t be able > to read it. > > > > To make a long story short: You may try to boot with the kernel parameter > “iommu=pt” as a workaround(!). > > > > Please let me know if this “fixes” the problem for you. YMMV. > > > > Best regards, > > Karsten > > > > -- > > *Dipl.-Inf. Karsten Weiss *s+c / Atos > > T +49 7071 9457 452 > > karsten.we...@atos.net > > https://atos.net/de/deutschland/sc-en > > > > *From:* lustre-discuss <lustre-discuss-boun...@lists.lustre.org> *On > Behalf Of *??? > *Sent:* Thursday, July 30, 2020 16:05 > *To:* lustre-discuss <lustre-discuss@lists.lustre.org> > *Subject:* [lustre-discuss] infiniband mlx5_0: dump_cqe:286:(pid 25761): > dump error cqe > > > > Hi, all > > > > we installed lustre-2.12.2 both server and clients ,recently,our oss's > syslog&dmesg flooding with messages like below: > > “ > > infiniband mlx5_0: dump_cqe:286:(pid 25761): dump error cqe > 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00000030: 00 00 00 00 00 00 88 13 08 00 84 79 01 04 4c d0 > LustreError: 25762:0:(events.c:450:server_bulk_callback()) event type 5, > status -5, desc ffff9ffdf58c0a00 > LustreError: 25755:0:(events.c:450:server_bulk_callback()) event type 5, > status -103, desc ffff9ffdf58c0a00 > LustreError: 25755:0:(events.c:450:server_bulk_callback()) event type 5, > status -103, desc ffff9ffdf58c0a00 > LustreError: 25755:0:(events.c:450:server_bulk_callback()) event type 5, > status -103, desc ffff9ffdf58c0a00 > > ” > > Does anyone hit this beforce or any suggestions? > > > > Thanks? >
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org