I found the error in our machine. We had an intermittent connection in one node's HCA card. I just happened to have looked at that node when the HCA was not found in `lscpi` or `proc`. I reset the card on its bus and kaboom... success. Thanks everyone for all your help.
On 8/23/07, John Leidel <[EMAIL PROTECTED]> wrote: > > Whats especially odd is that I can get a full bandwidth ping pong test > running fine [970MB/s++], then rerun the test and have it fail saying it > can't find the IB HCA. > > > On 8/23/07, Tziporet Koren <[EMAIL PROTECTED]> wrote: > > > > John Leidel wrote: > > > > > Unfortunately, the RDMA module load didn't help... a simple > > "hello_world" application still returns :: > > > > > > libibverbs: Fatal: no infiniband class devices found. > > > No IB device found > > > > > > I went and verified that all the nodes see the HCAs... an lspci on > > all nodes reports :: > > > > > > 07:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex > > (Tavor compatibility mode) (rev a0) > > > Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex > > (Tavor compatibility mode) > > > > Can you run: > > /etc/init.d/openibd restart > > and send the /var/log/messages > > > > Thanks > > Tziporet > > > >
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
