Hi Roland, Thanks for your reply!
Actually I'm working on porting IB driver to QNX platform. I resume the work started by my former colleague, and I just found that the sync codes (dev->cmd.poll_sem and dev->cmd.hcr_mutex) were deleted for unknown reason. After adding back these sync codes, the driver runs much smoothlier. However I still get a command exec error which I believe is relevant to command synchronization. The problem is when "Created UDAV" is called during SW2HW_MPT command is being executed, the SW2HW_MPT command would return with bad parameter error. Here are my debug trace output: 139903841835 HCR CMD: op_code: LE: d 139903861104 TRACE: mad.c:639/ib_mad_recv_done_handler 139903890876 HCR CMD: in_param_h: LE: 0 139903942869 TRACE: mad.c:644/ib_mad_recv_done_handler 139903993296 HCR CMD: in_param_l: LE: cf616000 139904038413 TRACE: verbs.c:182/ib_create_ah_from_wc 139904094753 HCR CMD: input_modifier: LE: 1e 139904139150 TRACE: mthca_provider.c:447/mthca_ah_create MTHCA DBG: <mthca_av.c:229> Created UDAV at 8075220/00000000: 139904197065 HCR CMD: out_pram_h: LE: 0 139904333343 [ 0] 01000005 139904384499 HCR CMD: out_pram_l: LE: 0 139904428086 [ 4] 0000ffff 139904478675 HCR CMD: token: LE: ffff0000 139904520156 [ 8] 00003000 139904572059 HCR CMD: op_code_modifier: LE: 0 139904612802 [ c] 00000000 139904667693 HCR CMD: event: LE: 0 139904708526 [10] 00000000 139904758422 HCR CMD 0x18h: LE=80000d, BE=d008000 139904799210 [14] 00000000 139904904204 [18] 00000000 139904946792MTHCA DBG: <mthca_cmd.c:195> HCR_STATUS 40100698= d008000 ? 8000 [1c] 00000002 139905076860 TRACE: mthca_av.c:235/mthca_create_ah 139905112329 TRACE: mthca_av.c:243/mthca_create_ah 139905147672 TRACE: mthca_provider.c:460/mthca_ah_create .... 139906793007 HCR CMD: Status Return: : 3 Do you have any idea? Thanks and have a good new year! Yicheng Roland Dreier <[EMAIL PROTECTED]> 12/28/2007 11:39 PM To Yicheng Jia <[EMAIL PROTECTED]> cc [email protected] Subject Re: [ofa-general] synchronize commands issued to MTHCA > I'm using OFED-1.0 and the problem I believe is related to command > synchronization of HCA. The host issues a MAD_INF command at first and > then a SW2HW_MTP command without waiting for the completion of the first > command. Both of commands return with bad parameters error. I guess you mean the MAD_IFC and SW2HW_MPT commands? I've never heard of a problem like that -- more details about your hardware/software config and the exact symptoms you see would be helpful in debugging. Anyway OFED 1.0 is ancient by now -- you are much better off just using drivers from the standard kernel. If you must use OFED, then OFED 1.2 or even a 1.3 prerelease would be better. > My question is why there's no synchronization mechanism for the command > execution on HCA, can I use "spin_lock" or "sem_wait" to synchronize > between every command? The HCA firmware allows multiple commands to be queued. The dev->cmd.event_sem semaphore is used to limit the number of outstanding commands to the HCA's capabilities, and the dev->cmd.hcr_mutex mutex is used to serialize the actual writing of commands to the HCA. There was a mmiowb() added to mthca_cmd_post() fairly recently that might fix your problems if you are running on a large SGI Altix system. - R. _____________________________________________________________________________ Scanned by IBM Email Security Management Services powered by MessageLabs. For more information please visit http://www.ers.ibm.com _____________________________________________________________________________
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
