Not sure if this is the correct forum, but we are experiencing problems with IB 
when running a commercial CFD code which is causing jobs to crash with the 
following errors. Could someone explain what is the likely cause of these and 
how we can minimise their occurrence.

Thanks Wayne

starccm+: Rank 0:172: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:172: MPI_Test: self cfd-cnsl-0230 peer cfd-cnsl-0144 (rank: 
219)
starccm+: Rank 0:172: MPI_Test: error message: transport retry exceeded error

Error: {'In': ['Machine::main', 'SimulationIterator::startIterating', 
'SteadySolver::step', 'SegregatedFlowSolver::iterationUpdate'], 'Neo.Error': 
'Error', 'Processor': 172, 'Severity': 'EXCEPTION', 'message': 'MPI Error : 
MPI_Test: Internal MPI error'}Synchronizing parallel nodes (attempt 0)


starccm+: Rank 0:71: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:68: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:71: MPI_Test: self cfd-cnsl-0196 peer cfd-cnsl-0214 (rank: 92)
starccm+: Rank 0:71: MPI_Test: error message: transport retry exceeded error
starccm+: Rank 0:68: MPI_Test: self cfd-cnsl-0196 peer cfd-cnsl-0214 (rank: 93)
starccm+: Rank 0:68: MPI_Test: error message: transport retry exceeded error

Error: {'In': ['Machine::main', 'SimulationIterator::startIterating', 
'SteadySolver::step', 'SegregatedFlowSolver::iterationUpdate', 
'AMGLinearSolver::solve'], 'Neo.Error': 'Error', 'Processor': 71, 'Severity': 
'EXCEPTION', 'message': 'MPI Error : MPI_Test: Internal MPI error'}
Synchronizing parallel nodes (attempt 0)
starccm+: Rank 0:68: MPI_Gather: ibv_poll_cq(): bad status 5
starccm+: Rank 0:68: MPI_Gather: self cfd-cnsl-0196 peer cfd-cnsl-0214 (rank: 
93)
starccm+: Rank 0:68: MPI_Gather: error message: work request flushed error
starccm+: Rank 0:71: MPI_Gather: ibv_poll_cq(): bad status 12
starccm+: Rank 0:71: MPI_Gather: self cfd-cnsl-0196 peer cfd-cnsl-0214 (rank: 
91)
starccm+: Rank 0:71: MPI_Gather: error message: transport retry exceeded error
/apps/CFD/CD-ADAPCO/Linux/starccm+3.04.008/star/bin/starenv: line 961:  5745 
Segmentation fault      "$@"

starccm+: Rank 0:118: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:46: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:42: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:118: MPI_Test: self cfd-cnsl-0408 peer cfd-cnsl-0452 (rank: 
229)
starccm+: Rank 0:118: MPI_Test: error message: transport retry exceeded error
starccm+: Rank 0:42: MPI_Test: self cfd-cnsl-0271 peer cfd-cnsl-0452 (rank: 229)
starccm+: Rank 0:42: MPI_Test: error message: transport retry exceeded error
starccm+: Rank 0:46: MPI_Test: self cfd-cnsl-0271 peer cfd-cnsl-0452 (rank: 228)
starccm+: Rank 0:46: MPI_Test: error message: transport retry exceeded error
starccm+: Rank 0:86: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:87: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:93: MPI_Test: ibv_poll_cq(): bad status 12

starccm+: Rank 0:244: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:244: MPI_Test: self cfd-cnsl-0342 peer cfd-cnsl-0257 (rank: 26)
starccm+: Rank 0:244: MPI_Test: error message: transport retry exceeded error

Error: {'In': ['Machine::main', 'SimulationIterator::startIterating', 
'SteadySolver::step', 'RsTurbSolver::iterationUpdate'], 'Neo.Error': 'Error', 
'Processor': 244, 'Severity': 'EXCEPTION', 'message': 'MPI Error : MPI_Test: 
Internal MPI error'}
Synchronizing parallel nodes (attempt 0)
starccm+: Rank 0:26: MPI_Cancel: ibv_poll_cq(): bad status 12
starccm+: Rank 0:26: MPI_Cancel: self cfd-cnsl-0257 peer cfd-cnsl-0342 (rank: 
244)
starccm+: Rank 0:26: MPI_Cancel: error message: transport retry exceeded error
starccm+: Rank 0:244: MPI_Cancel: ibv_poll_cq(): bad status 5
starccm+: Rank 0:244: MPI_Cancel: self cfd-cnsl-0342 peer cfd-cnsl-0257 (rank: 
26)
starccm+: Rank 0:244: MPI_Cancel: error message: work request flushed error
starccm+: Rank 0:244: MPI_Cancel: MPI BUG: no requests done
/apps/CFD/CD-ADAPCO/Linux/starccm+3.04.008/star/bin/starenv: line 961:  5729 
Segmentation fault      "$@"
MPI Application rank 244 exited before MPI_Finalize() with status 139

hung

starccm+: Rank 0:58: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:57: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:57: MPI_Test: self cfd-cnsl-0401 peer cfd-cnsl-0448 (rank: 40)
starccm+: Rank 0:57: MPI_Test: error message: transport retry exceeded error
starccm+: Rank 0:58: MPI_Test: self cfd-cnsl-0401 peer cfd-cnsl-0448 (rank: 42)
starccm+: Rank 0:58: MPI_Test: error message: transport retry exceeded error
starccm+: Rank 0:72: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:72: MPI_Test: self cfd-cnsl-0371 peer cfd-cnsl-0277 (rank: 1)
starccm+: Rank 0:72: MPI_Test: error message: transport retry exceeded error
starccm+: Rank 0:74: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:74: MPI_Test: self cfd-cnsl-0371 peer cfd-cnsl-0277 (rank: 1)
starccm+: Rank 0:74: MPI_Test: error message: transport retry exceeded error
starccm+: Rank 0:75: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:75: MPI_Test: self cfd-cnsl-0371 peer cfd-cnsl-0448 (rank: 40)
starccm+: Rank 0:75: MPI_Test: error message: transport retry exceeded error

starccm+: Rank 0:26: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:29: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:29: MPI_Test: self cfd-cnsl-0349 peer cfd-cnsl-0418 (rank: 252)
starccm+: Rank 0:29: MPI_Test: error message: transport retry exceeded error
starccm+: Rank 0:26: MPI_Test: self cfd-cnsl-0349 peer cfd-cnsl-0418 (rank: 254)
starccm+: Rank 0:26: MPI_Test: error message: transport retry exceeded error
starccm+: Rank 0:134: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:129: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:135: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:131: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:130: MPI_Test: ibv_poll_cq(): bad status 12
starccm+: Rank 0:134: MPI_Test: self cfd-cnsl-0386 peer cfd-cnsl-0418 (rank: 
250)
starccm+: Rank 0:134: MPI_Test: error message: transport retry exceeded error
starccm+: Rank 0:131: MPI_Test: self cfd-cnsl-0386 peer cfd-cnsl-0418 (rank: 
255)
starccm+: Rank 0:131: MPI_Test: error message: transport retry exceeded error
starccm+: Rank 0:130: MPI_Test: self cfd-cnsl-0386 peer cfd-cnsl-0418 (rank: 
254)
starccm+: Rank 0:130: MPI_Test: error message: transport retry exceeded error
starccm+: Rank 0:129: MPI_Test: self cfd-cnsl-0386 peer cfd-cnsl-0418 (rank: 
254)
starccm+: Rank 0:129: MPI_Test: error message: transport retry exceeded error



---------------------------------------------------------------------

For further information on the Renault F1 Team visit our web site at 
www.renaultf1.com.
Renault F1 Team Limited
Registered in England no. 1806337
Registered Office: 16 Old Bailey London EC4M 7EG


WARNING: please ensure that you have adequate virus protection in place before 
you open or detach any documents attached to this email.

This e-mail may constitute privileged information. If you are not the intended 
recipient, you have received this confidential email and any attachments 
transmitted with it in error and you must not disclose copy, circulate or in 
any other way use or rely on this information.

E-mails to and from the Renault F1 Team are monitored for operational reasons 
and in accordance with lawful business practices.

The contents of this email are those of the individual and do not necessarily 
represent the views of the company.

Please note that this e-mail has been created in the knowledge that Internet 
e-mail is not a 100% secure communications medium. We advise that you 
understand and observe this lack of security when e-mailing us.

If you have received this email in error please forward to: 
[email protected] quoting the sender, then delete the message and 
any attached documents
---------------------------------------------------------------------

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to