It looks to me like you have localmap enabled (it's actually true by
default) which is an optimization where for access to local blocks (served
by a local datanode) mmap is used. Somehow it seems the local endpoint
can't find the directory where the data is (which shouldn't be). In anycase
you can try turning off localmap in your crail-site.conf by adding.

crail.storage.rdma.localmap    false

If that does not fix it, please send the full printout of the config
parameters at the client.

-Patrick

On Tue, Sep 24, 2019 at 3:40 PM Jeongyoon Eo <jeongyoon0...@gmail.com>
wrote:

> Hi,
>
> I'm trying Spark with Apache Crail on RDMA-capable Mellanox connected 2
> machines, and I keep getting this error:
>
> 19/09/24 13:18:41 INFO ibm.disni: createEventChannel, objId 139745971170192
> 19/09/24 13:18:41 INFO ibm.disni: passive endpoint group, maxWR 32, maxSge
> 4, cqSize 64
> 19/09/24 13:18:41 INFO ibm.disni: launching cm processor, cmChannel 0
> 19/09/24 13:18:41 INFO apache.crail: new local endpoint for address /
> 172.30.100.108:9061
> 19/09/24 13:18:41 INFO apache.crail: new local dataPath
> /dev/hugepages/data/172.30.100.108-9061
> 19/09/24 13:18:41 INFO apache.crail: ERROR: failed data operation
> 19/09/24 13:18:41 INFO apache.crail: new local endpoint for address /
> 172.30.100.108:9061
> java.io.IOException: java.lang.Exception: Local RDMA data path missing
>         at
>
> org.apache.crail.storage.rdma.client.RdmaStoragePassiveGroup.createEndpoint(RdmaStoragePassiveGroup.java:53)
>         at
>
> org.apache.crail.storage.rdma.RdmaStorageClient.createEndpoint(RdmaStorageClient.java:84)
>         at
>
> org.apache.crail.utils.EndpointCache$StorageEndpointCache.getDataEndpoint(EndpointCache.java:130)
>         at
> org.apache.crail.utils.EndpointCache.getDataEndpoint(EndpointCache.java:69)
>         at
> org.apache.crail.core.CoreStream.prepareAndTrigger(CoreStream.java:230)
>         at
> org.apache.crail.core.CoreStream.dataOperation(CoreStream.java:100)
>
> I've checked the read/write permission of /dev/hugepages/data/ and it was
> okay.
> Is there any other one experienced similar issues?
>
> Best regards,
> Jeongyoon.
>
> *----------*
> *Jeongyoon Eo*
> Software Platform Lab
> Department of Computer Science and Engineering
> Seoul National University
> Email: jeongyoon0...@gmail.com <jeongyoon...@snu.ac.kr>
>

Reply via email to