Also some newer results. I had a long run going on the 128 node machine to see how many groups I really could join, and it just errored out after joining 892 groups successfully. Specifically, I got an RDMA_CM_EVENT_MULTICAST_ERROR event containing status -22 ('Unknown error' according to sterror). errno is still cleared to 'Success'. I don't have time go look at the code to see where this came from right now, but does anyone know what it means?
This is EINVAL and is coming from the librdmacm. That doesn't really help narrow down what the actual cause is unfortunately. And I don't understand the behavior that you're seeing at all.
- Sean _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
