Hi, I'm trying TeraSort example on Spark-Crail using RDMA by using latest incubator-crail and disni, crail-spark-io, crail-spark-terasort from https://github.com/zrlio.
I'm using two machines with Ubuntu 18.04, one for CrailNameNode and the other for StorageServer. When running start-crail.sh, following error appears from StorageServer Crail log. 19/06/20 14:58:42 INFO crail: connected to namenode(s) /172.30.100.4:9060 Exception in thread "main" java.io.IOException: j2c::regMr: ibv_reg_mr failed: Cannot allocate memory at com.ibm.disni.verbs.impl.NativeDispatcher._regMr(Native Method) at com.ibm.disni.verbs.impl.NatRegMrCall.execute(NatRegMrCall.java:91) at com.ibm.disni.verbs.impl.NatRegMrCall.execute(NatRegMrCall.java:36) at org.apache.crail.storage.rdma.RdmaStorageServer.allocateResource(RdmaStorageServer.java:112) at org.apache.crail.storage.StorageServer.main(StorageServer.java:152) When testing RDMA by C code, ibv_reg_mr succeeded, so I think there might be some conflict between libdisni.so which Crail uses(or other Crail components?) and the underlying RDMA libraries. Is there anyone who experienced this kind of Cannot allocate memory errors? If so, could you share your troubleshooting story? Any other help would be great! Thank you in advance. - Jeongyoon