Hi Meiko, I don't see anything immediately wrong with your setup. Have you tried running rping (to test if CM works in general) or ibv_devices (to check if the verbs dev works) in the container?
Jonas On Wednesday, January 29th, 2025 at 07:15, Meiko Prilop <meiko.prilop-...@ibm.com.INVALID> wrote: > > > Dear Sir or Madam, > > My name is Meiko Prilop and I am currently writing my master's thesis in the > field of caching using RDMA at the Vienna University of Technology. In the > course of my research, I came across your Crail project, which is exactly the > system I need for my master's thesis. I found your contact in the Github > project incubator-crail and wanted to try my luck asking questions about the > project. In the past, I was able to get in touch with Dr Metzler through my > job at IBM, who was able to give me information regarding Soft-iWARP. > However, I am now faced with the hurdle of setting up and testing Crail. > > For my setup I have Ubuntu instances of version 18.04 running, where I setup > Soft-iWARP accordingly to the information in > > > https://github.com/animeshtrivedi/blog/blob/master/post/2019-06-26-siw.md > > I am then prompted three devices using the command ibv_devices: > > device node GUID > ------ ---------------- > siw_docker0 0242187be7d30000 > siw_lo 7369775f6c6f0000 > siw_enp1s0 525400afe5890000 > > And more precisely, using rdma link show: > > 1/1: siw_lo/1: state ACTIVE physical_state LINK_UP > 2/1: siw_enp1s0/1: state ACTIVE physical_state LINK_UP > 3/1: siw_docker0/1: state ACTIVE physical_state LINK_UP > > I have been able to test rping functionality between two instances so far. > > Next, I followed the steps described in: > https://crail.readthedocs.io/en/latest/docker.html > Although it appears to be the old link to the read me files, the one stated > in the incubator-crail github are not available anymore. It appears to me > that these setup descriptions are still the same as in the doc folder tho. > > I cloned the repository found at: > https://github.com/apache/incubator-crail/tree/master > and created an image using the Dockerfile found at /docker/RDMA. > > I then try to run the command stated below: > > sudo docker run -it --network host -e NAMENODE_HOST=rdma0 -e INTERFACE=enp1s0 > --cap-add=IPC_LOCK --device=/dev/infiniband/uverbs0 > --device=/dev/infiniband/rdma_cm -v /dev/hugepages:/dev/hugepages crail-rdma > namenode > > I further recognized that uverbs0 and rdma_cm are available in the path given. > > When running the command, I get this error: > > Exception in thread "main" java.io.IOException: j2c::createEventChannel: > rdma_create_event_channel failed: No such device > > at com.ibm.disni.verbs.impl.NativeDispatcher._createEventChannel(Native > Method) > at com.ibm.disni.verbs.impl.RdmaCmNat.createEventChannel(RdmaCmNat.java:60) > at > com.ibm.disni.verbs.RdmaEventChannel.createEventChannel(RdmaEventChannel.java:67) > at com.ibm.disni.RdmaCmProcessor.<init>(RdmaCmProcessor.java:48) > > at com.ibm.disni.RdmaEndpointGroup.<init>(RdmaEndpointGroup.java:61) > > at com.ibm.darpc.DaRPCEndpointGroup.<init>(DaRPCEndpointGroup.java:47) > > at com.ibm.darpc.DaRPCServerGroup.<init>(DaRPCServerGroup.java:58) > > at com.ibm.darpc.DaRPCServerGroup.createServerGroup(DaRPCServerGroup.java:52) > at > org.apache.crail.namenode.rpc.darpc.DaRPCNameNodeServer.init(DaRPCNameNodeServer.java:56) > at org.apache.crail.namenode.NameNode.main(NameNode.java:92) > > > Unfortunately, I can't seem to fix this issue at this point. I hope you can > help me with this problem, Crail seems to me to be a fundamental building > block for my Master's thesis! > > Thank you very much for your time and I look forward to your feedback. > > > Yours sincerely > > Meiko Prilop