Hi, I'm trying Spark with Apache Crail on RDMA-capable Mellanox connected 2 machines, and I keep getting this error:
19/09/24 13:18:41 INFO ibm.disni: createEventChannel, objId 139745971170192 19/09/24 13:18:41 INFO ibm.disni: passive endpoint group, maxWR 32, maxSge 4, cqSize 64 19/09/24 13:18:41 INFO ibm.disni: launching cm processor, cmChannel 0 19/09/24 13:18:41 INFO apache.crail: new local endpoint for address / 172.30.100.108:9061 19/09/24 13:18:41 INFO apache.crail: new local dataPath /dev/hugepages/data/172.30.100.108-9061 19/09/24 13:18:41 INFO apache.crail: ERROR: failed data operation 19/09/24 13:18:41 INFO apache.crail: new local endpoint for address / 172.30.100.108:9061 java.io.IOException: java.lang.Exception: Local RDMA data path missing at org.apache.crail.storage.rdma.client.RdmaStoragePassiveGroup.createEndpoint(RdmaStoragePassiveGroup.java:53) at org.apache.crail.storage.rdma.RdmaStorageClient.createEndpoint(RdmaStorageClient.java:84) at org.apache.crail.utils.EndpointCache$StorageEndpointCache.getDataEndpoint(EndpointCache.java:130) at org.apache.crail.utils.EndpointCache.getDataEndpoint(EndpointCache.java:69) at org.apache.crail.core.CoreStream.prepareAndTrigger(CoreStream.java:230) at org.apache.crail.core.CoreStream.dataOperation(CoreStream.java:100) I've checked the read/write permission of /dev/hugepages/data/ and it was okay. Is there any other one experienced similar issues? Best regards, Jeongyoon. *----------* *Jeongyoon Eo* Software Platform Lab Department of Computer Science and Engineering Seoul National University Email: jeongyoon0...@gmail.com <jeongyoon...@snu.ac.kr>