It seems inline flag was set in the WR. That's all I know. Quoting Tang, Changqing <[EMAIL PROTECTED]>: Subject: RE: local QP operation error after long run
>From our code, num_sge=1 all the time. But from error message, can you figure out num_sge is actually 0 ? and with inline flag ? --CQ > -----Original Message----- > From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] > Sent: Thursday, August 30, 2007 8:59 AM > To: Tang, Changqing > Cc: Roland Dreier; Michael S. Tsirkin; [email protected] > Subject: Re: local QP operation error after long run > > > Apparently, an inline work request is malformed. > Yes, this could indicate memory corruption. > OTOH, I see this in commit history: > commit c2623102f3e38e7684e435b77403d16dc6ddb585 > Author: Roland Dreier <[EMAIL PROTECTED]> > Date: Mon Nov 28 21:21:08 2005 +0000 > > Fix inline sends with no gather entries > > Fix bug in handling send requests that have the inline flag set > but do not include any gather entries. > > Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> > > is there a chance you are posting some 0-size WRs? > If yes, just clearing the inline flag will fix it. > > > > Quoting Tang, Changqing <[EMAIL PROTECTED]>: > Subject: local QP operation error after long run > > > HI, > I have an ISV application running for nearly three > hours, and then it has following error from libibverbs.so: > > local QP operation err (QPN 440446, WQE @ 00000103, CQN 10008c, index > 236192) > [ 0] 00440446 > [ 4] 00000000 > [ 8] 00000000 > [ c] 00000000 > [10] 026f0000 > [14] 00000000 > [18] 00000103 > [1c] ff100000 > > local QP operation err (QPN 440442, WQE @ 00000103, CQN 10008c, index > 236193) > [ 0] 00440442 > [ 4] 00000000 > [ 8] 00000000 > [ c] 00000000 > [10] 026f0000 > [14] 00000000 > [18] 00000103 > [1c] ff100000 > > Can you guys indicate what the possible reason is ? this is > an OFED 1.1 system. Could it be a memory corruption ? > > Thanks > --CQ, HP-MPI > > > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of Roland > > Dreier > > Sent: Wednesday, August 29, 2007 9:50 PM > > To: Sasha Khapyorsky > > Cc: [email protected] > > Subject: Re: [ofa-general] ib_umad method mask problems on > big-endian > > 64-bitarchs > > > > > It looks that using uint32_t for addr in set_bit() function is > > sufficient > fix. But for ppc64 this means that new OpenSM > will break > > with old > kernels, probably we will need to put some ugly > #ifdef in > > > osm_vendor_ibumad.c... > > > > Yes, that's a pain. Another possibility is to declare that the > > declaration of the registration request should have been > > > > long method_mask[16 / sizeof (long)]; > > > > and just add a compat_ioctl method to the ib_umad module to > handle the > > broken case of 32-bit big endian userspace on a 64-bit kernel. > > However that breaks 64-bit big endian userspace that > followed the old > > ib_user_mad.h file correctly so overall I'm leaning towards > the patch > > I already posted. > > > > What do you think? > > > > - R. > > _______________________________________________ > > general mailing list > > [email protected] > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > -- > MST > -- MST _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
