Hi,

I'm trying Spark with Apache Crail on RDMA-capable Mellanox connected 2
machines, and I keep getting this error:

19/09/24 13:18:41 INFO ibm.disni: createEventChannel, objId 139745971170192
19/09/24 13:18:41 INFO ibm.disni: passive endpoint group, maxWR 32, maxSge
4, cqSize 64
19/09/24 13:18:41 INFO ibm.disni: launching cm processor, cmChannel 0
19/09/24 13:18:41 INFO apache.crail: new local endpoint for address /
172.30.100.108:9061
19/09/24 13:18:41 INFO apache.crail: new local dataPath
/dev/hugepages/data/172.30.100.108-9061
19/09/24 13:18:41 INFO apache.crail: ERROR: failed data operation
19/09/24 13:18:41 INFO apache.crail: new local endpoint for address /
172.30.100.108:9061
java.io.IOException: java.lang.Exception: Local RDMA data path missing
        at
org.apache.crail.storage.rdma.client.RdmaStoragePassiveGroup.createEndpoint(RdmaStoragePassiveGroup.java:53)
        at
org.apache.crail.storage.rdma.RdmaStorageClient.createEndpoint(RdmaStorageClient.java:84)
        at
org.apache.crail.utils.EndpointCache$StorageEndpointCache.getDataEndpoint(EndpointCache.java:130)
        at
org.apache.crail.utils.EndpointCache.getDataEndpoint(EndpointCache.java:69)
        at
org.apache.crail.core.CoreStream.prepareAndTrigger(CoreStream.java:230)
        at
org.apache.crail.core.CoreStream.dataOperation(CoreStream.java:100)

I've checked the read/write permission of /dev/hugepages/data/ and it was
okay.
Is there any other one experienced similar issues?

Best regards,
Jeongyoon.

*----------*
*Jeongyoon Eo*
Software Platform Lab
Department of Computer Science and Engineering
Seoul National University
Email: jeongyoon0...@gmail.com <jeongyoon...@snu.ac.kr>

Reply via email to