Dear Sir or Madam,

since my first email, I was able to setup Crail natively on my Ubuntu 18.04 
machine having a namenode and datanode that recognize eachother instead of 
trying to use docker. However, now when I try to test my setup using crails 
inbuild tools on another machine, I get some issues that I wasnt able to 
resolve myself.

TLDR: All approaches to setting up the name and datanode resulted in the client 
machine to run into:
      java.lang.OutOfMemoryError: Map failed

On two machines separately, Ive setup Soft-iWARP, using
https://github.com/animeshtrivedi/blog/blob/master/post/2019-06-26-siw.md
and getting the expected output using ibv_devices.

Further, rping is able to establish connection between both machines.

I then setup crail following the description of
https://crail.readthedocs.io/en/latest/source.html
https://crail.readthedocs.io/en/latest/config.html
where I setup my crail-site.conf to look like:
      
crail.namenode.address            crail://128.131.57.140:9060
crail.namenode.rpctype            
org.apache.crail.namenode.rpc.darpc.DaRPCNameNode
crail.cachepath                   /dev/hugepages/cache
crail.regionsize                  268435456
crail.cachelimit                  268435456
crail.storage.types               org.apache.crail.storage.rdma.RdmaStorageTier
crail.storage.rdma.interface       enp1s0
crail.storage.rdma.datapath        /dev/hugepages/data
crail.storage.rdma.storagelimit    268435456


On both machines. Here I drastically reduced the default values on sizing. I 
changed core-site.xml to hold the address here as well at fs.defaultFS .

Further, when checking   cat /proc/meminfo   I get for the client:

MemTotal:       16424160 kB
MemFree:         4983380 kB
MemAvailable:    9580404 kB
Buffers:          975056 kB
Cached:          3347588 kB
SwapCached:         1920 kB
Active:          3856868 kB
Inactive:        2377380 kB
Active(anon):    1339104 kB
Inactive(anon):   552592 kB
Active(file):    2517764 kB
Inactive(file):  1824788 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       4194300 kB
SwapFree:        4176624 kB
Dirty:               108 kB
Writeback:             0 kB
AnonPages:       1909928 kB
Mapped:           632464 kB
Shmem:            132512 kB
Slab:             866876 kB
SReclaimable:     594772 kB
SUnreclaim:       272104 kB
KernelStack:       16768 kB
PageTables:        20932 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    10309228 kB
Committed_AS:    8433608 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:     34816 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:    2048
HugePages_Free:     2044
HugePages_Rsvd:     2044
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      728940 kB
DirectMap2M:    11853824 kB
DirectMap1G:     6291456 kB

And on my server:

MemTotal:       16424160 kB
MemFree:         3771716 kB
MemAvailable:    6517888 kB
Buffers:          309848 kB
Cached:          2576436 kB
SwapCached:            0 kB
Active:          2398708 kB
Inactive:        1465044 kB
Active(anon):     977800 kB
Inactive(anon):      356 kB
Active(file):    1420908 kB
Inactive(file):  1464688 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       4194300 kB
SwapFree:        4194300 kB
Dirty:                76 kB
Writeback:             0 kB
AnonPages:        977544 kB
Mapped:           233700 kB
Shmem:               820 kB
Slab:             294456 kB
SReclaimable:     202184 kB
SUnreclaim:        92272 kB
KernelStack:        6096 kB
PageTables:        12184 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8212076 kB
Committed_AS:    2616856 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:    4096
HugePages_Free:     3968
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      210796 kB
DirectMap2M:     7129088 kB
DirectMap1G:    11534336 kB

Checking if my hugetables are mounted using   mount | grep huge on the server 
and client, the output is:
      hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)

Spinning up a namenode on the server, the output is as follows:
      25/03/25 13:57:14 INFO crail: initalizing namenode
      25/03/25 13:57:14 INFO crail: crail.version 3101
      25/03/25 13:57:14 INFO crail: crail.directorydepth 16
      25/03/25 13:57:14 INFO crail: crail.tokenexpiration 10
      25/03/25 13:57:14 INFO crail: crail.blocksize 1048576
      25/03/25 13:57:14 INFO crail: crail.cachelimit 268435456
      25/03/25 13:57:14 INFO crail: crail.cachepath /dev/hugepages/cache
      25/03/25 13:57:14 INFO crail: crail.user crail
      25/03/25 13:57:14 INFO crail: crail.shadowreplication 1
      25/03/25 13:57:14 INFO crail: crail.debug false
      25/03/25 13:57:14 INFO crail: crail.statistics true
      25/03/25 13:57:14 INFO crail: crail.rpctimeout 1000
      25/03/25 13:57:14 INFO crail: crail.datatimeout 1000
      25/03/25 13:57:14 INFO crail: crail.buffersize 1048576
      25/03/25 13:57:14 INFO crail: crail.slicesize 524288
      25/03/25 13:57:14 INFO crail: crail.singleton true
      25/03/25 13:57:14 INFO crail: crail.regionsize 268435456
      25/03/25 13:57:14 INFO crail: crail.directoryrecord 512
      25/03/25 13:57:14 INFO crail: crail.directoryrandomize true
      25/03/25 13:57:14 INFO crail: crail.cacheimpl 
org.apache.crail.memory.MappedBufferCache
      25/03/25 13:57:14 INFO crail: crail.locationmap
      25/03/25 13:57:14 INFO crail: crail.namenode.address 
crail://128.131.57.140:9060?id=0&size=1
      25/03/25 13:57:14 INFO crail: crail.namenode.blockselection roundrobin
      25/03/25 13:57:14 INFO crail: crail.namenode.fileblocks 16
      25/03/25 13:57:14 INFO crail: crail.namenode.rpctype 
org.apache.crail.namenode.rpc.darpc.DaRPCNameNode
      25/03/25 13:57:14 INFO crail: crail.namenode.rpcservice 
org.apache.crail.namenode.NameNodeService
      25/03/25 13:57:14 INFO crail: crail.namenode.log
      25/03/25 13:57:14 INFO crail: crail.storage.types 
org.apache.crail.storage.rdma.RdmaStorageTier
      25/03/25 13:57:14 INFO crail: crail.storage.classes 1
      25/03/25 13:57:14 INFO crail: crail.storage.rootclass 0
      25/03/25 13:57:14 INFO crail: crail.storage.keepalive 2
      25/03/25 13:57:14 INFO crail: crail.elasticstore.scaleup 0.4
      25/03/25 13:57:14 INFO crail: crail.elasticstore.scaledown 0.1
      25/03/25 13:57:14 INFO crail: crail.elasticstore.maxnodes 10
      25/03/25 13:57:14 INFO crail: crail.elasticstore.minnodes 1
      25/03/25 13:57:14 INFO crail: crail.elasticstore.policyrunner.interval 
1000
      25/03/25 13:57:14 INFO crail: crail.elasticstore.logging false
      25/03/25 13:57:14 INFO crail: round robin block selection
      25/03/25 13:57:14 INFO crail: rpc group started, recvQueue 32
      25/03/25 13:57:14 INFO darpc: running resource management, index 0, 
affinity 2, timeout 2147483647
      25/03/25 13:57:14 INFO crail: crail.namenode.darpc.polling false
      25/03/25 13:57:14 INFO crail: crail.namenode.darpc.type passive
      25/03/25 13:57:14 INFO crail: crail.namenode.darpc.affinity 1
      25/03/25 13:57:14 INFO crail: crail.namenode.darpc.maxinline 0
      25/03/25 13:57:14 INFO crail: crail.namenode.darpc.recvQueue 32
      25/03/25 13:57:14 INFO crail: crail.namenode.darpc.sendQueue 32
      25/03/25 13:57:14 INFO crail: crail.namenode.darpc.pollsize 32
      25/03/25 13:57:14 INFO crail: crail.namenode.darpc.clustersize 128
      25/03/25 13:57:14 INFO crail: crail.namenode.darpc.backlog 100
      25/03/25 13:57:14 INFO crail: crail.namenode.darpc.connecttimeout 1000
      25/03/25 13:57:14 INFO crail: opened server at /128.131.57.140:9060


Now spinning up the datanode using $CRAIL_HOME/bin/crail datanode -t 
org.apache.crail.storage.rdma.RdmaStorageTier:

      25/03/25 13:59:05 INFO crail: crail.version 3101
      25/03/25 13:59:05 INFO crail: crail.directorydepth 16
      25/03/25 13:59:05 INFO crail: crail.tokenexpiration 10
      25/03/25 13:59:05 INFO crail: crail.blocksize 1048576
      25/03/25 13:59:05 INFO crail: crail.cachelimit 268435456
      25/03/25 13:59:05 INFO crail: crail.cachepath /dev/hugepages/cache
      25/03/25 13:59:05 INFO crail: crail.user crail
      25/03/25 13:59:05 INFO crail: crail.shadowreplication 1
      25/03/25 13:59:05 INFO crail: crail.debug false
      25/03/25 13:59:05 INFO crail: crail.statistics true
      25/03/25 13:59:05 INFO crail: crail.rpctimeout 1000
      25/03/25 13:59:05 INFO crail: crail.datatimeout 1000
      25/03/25 13:59:05 INFO crail: crail.buffersize 1048576
      25/03/25 13:59:05 INFO crail: crail.slicesize 524288
      25/03/25 13:59:05 INFO crail: crail.singleton true
      25/03/25 13:59:05 INFO crail: crail.regionsize 268435456
      25/03/25 13:59:05 INFO crail: crail.directoryrecord 512
      25/03/25 13:59:05 INFO crail: crail.directoryrandomize true
      25/03/25 13:59:05 INFO crail: crail.cacheimpl 
org.apache.crail.memory.MappedBufferCache
      25/03/25 13:59:05 INFO crail: crail.locationmap
      25/03/25 13:59:05 INFO crail: crail.namenode.address 
crail://128.131.57.140:9060
      25/03/25 13:59:05 INFO crail: crail.namenode.blockselection roundrobin
      25/03/25 13:59:05 INFO crail: crail.namenode.fileblocks 16
      25/03/25 13:59:05 INFO crail: crail.namenode.rpctype 
org.apache.crail.namenode.rpc.darpc.DaRPCNameNode
      25/03/25 13:59:05 INFO crail: crail.namenode.rpcservice 
org.apache.crail.namenode.NameNodeService
      25/03/25 13:59:05 INFO crail: crail.namenode.log
      25/03/25 13:59:05 INFO crail: crail.storage.types 
org.apache.crail.storage.rdma.RdmaStorageTier
      25/03/25 13:59:05 INFO crail: crail.storage.classes 1
      25/03/25 13:59:05 INFO crail: crail.storage.rootclass 0
      25/03/25 13:59:05 INFO crail: crail.storage.keepalive 2
      25/03/25 13:59:05 INFO crail: crail.elasticstore.scaleup 0.4
      25/03/25 13:59:05 INFO crail: crail.elasticstore.scaledown 0.1
      25/03/25 13:59:05 INFO crail: crail.elasticstore.maxnodes 10
      25/03/25 13:59:05 INFO crail: crail.elasticstore.minnodes 1
      25/03/25 13:59:05 INFO crail: crail.elasticstore.policyrunner.interval 
1000
      25/03/25 13:59:05 INFO crail: crail.elasticstore.logging false
      25/03/25 13:59:05 INFO crail: crail.storage.rdma.interface enp1s0
      25/03/25 13:59:05 INFO crail: crail.storage.rdma.port 50020
      25/03/25 13:59:05 INFO crail: crail.storage.rdma.storagelimit 268435456
      25/03/25 13:59:05 INFO crail: crail.storage.rdma.allocationsize 268435456
      25/03/25 13:59:05 INFO crail: crail.storage.rdma.datapath 
/dev/hugepages/data
      25/03/25 13:59:05 INFO crail: crail.storage.rdma.localmap true
      25/03/25 13:59:05 INFO crail: crail.storage.rdma.queuesize 32
      25/03/25 13:59:05 INFO crail: crail.storage.rdma.type passive
      25/03/25 13:59:05 INFO crail: crail.storage.rdma.backlog 100
      25/03/25 13:59:05 INFO crail: crail.storage.rdma.connecttimeout 1000
      25/03/25 13:59:05 INFO crail: rdma storage server started, address 
/128.131.57.140:50020, persistent false, maxWR 1, maxSge 1, cqSize 1
      25/03/25 13:59:05 INFO crail: rpc group started, recvQueue 32
      25/03/25 13:59:05 INFO crail: crail.namenode.darpc.polling false
      25/03/25 13:59:05 INFO crail: crail.namenode.darpc.type passive
      25/03/25 13:59:05 INFO crail: crail.namenode.darpc.affinity 1
      25/03/25 13:59:05 INFO crail: crail.namenode.darpc.maxinline 0
      25/03/25 13:59:05 INFO crail: crail.namenode.darpc.recvQueue 32
      25/03/25 13:59:05 INFO crail: crail.namenode.darpc.sendQueue 32
      25/03/25 13:59:05 INFO crail: crail.namenode.darpc.pollsize 32
      25/03/25 13:59:05 INFO crail: crail.namenode.darpc.clustersize 128
      25/03/25 13:59:05 INFO crail: crail.namenode.darpc.backlog 100
      25/03/25 13:59:05 INFO crail: crail.namenode.darpc.connecttimeout 1000
      25/03/25 13:59:06 INFO crail: connected to namenode(s) 
/128.131.57.140:9060
      25/03/25 13:59:06 INFO crail: datanode statistics, freeBlocks 256
      25/03/25 13:59:06 INFO crail: datanode statistics, freeBlocks 256


Finally, on my client I try to test using the inbuild functions of iobench and 
fsck and further I have setup a disni client following the examples and the 
configuration found in
https://github.com/brianfrankcooper/YCSB/tree/master/crail/src/main/java/site/ycsb/db/crail

The first attempt of connecting to the namenode is the experiment of fsck ping. 
With  $CRAIL_HOME/bin/crail fsck -t ping I get:
      25/03/25 14:10:58 INFO crail: crail.version 3101
      25/03/25 14:10:58 INFO crail: crail.directorydepth 16
      25/03/25 14:10:58 INFO crail: crail.tokenexpiration 10
      25/03/25 14:10:58 INFO crail: crail.blocksize 1048576
      25/03/25 14:10:58 INFO crail: crail.cachelimit 268435456
      25/03/25 14:10:58 INFO crail: crail.cachepath /dev/hugepages/cache
      25/03/25 14:10:58 INFO crail: crail.user crail
      25/03/25 14:10:58 INFO crail: crail.shadowreplication 1
      25/03/25 14:10:58 INFO crail: crail.debug false
      25/03/25 14:10:58 INFO crail: crail.statistics true
      25/03/25 14:10:58 INFO crail: crail.rpctimeout 1000
      25/03/25 14:10:58 INFO crail: crail.datatimeout 1000
      25/03/25 14:10:58 INFO crail: crail.buffersize 1048576
      25/03/25 14:10:58 INFO crail: crail.slicesize 524288
      25/03/25 14:10:58 INFO crail: crail.singleton true
      25/03/25 14:10:58 INFO crail: crail.regionsize 268435456
      25/03/25 14:10:58 INFO crail: crail.directoryrecord 512
      25/03/25 14:10:58 INFO crail: crail.directoryrandomize true
      25/03/25 14:10:58 INFO crail: crail.cacheimpl 
org.apache.crail.memory.MappedBufferCache
      25/03/25 14:10:58 INFO crail: crail.locationmap
      25/03/25 14:10:58 INFO crail: crail.namenode.address 
crail://128.131.57.140:9060
      25/03/25 14:10:58 INFO crail: crail.namenode.blockselection roundrobin
      25/03/25 14:10:58 INFO crail: crail.namenode.fileblocks 16
      25/03/25 14:10:58 INFO crail: crail.namenode.rpctype 
org.apache.crail.namenode.rpc.tcp.TcpNameNode
      25/03/25 14:10:58 INFO crail: crail.namenode.rpcservice 
org.apache.crail.namenode.NameNodeService
      25/03/25 14:10:58 INFO crail: crail.namenode.log
      25/03/25 14:10:58 INFO crail: crail.storage.types 
org.apache.crail.storage.rdma.RdmaStorageTier
      25/03/25 14:10:58 INFO crail: crail.storage.classes 1
      25/03/25 14:10:58 INFO crail: crail.storage.rootclass 0
      25/03/25 14:10:58 INFO crail: crail.storage.keepalive 2
      25/03/25 14:10:58 INFO crail: crail.elasticstore.scaleup 0.4
      25/03/25 14:10:58 INFO crail: crail.elasticstore.scaledown 0.1
      25/03/25 14:10:58 INFO crail: crail.elasticstore.maxnodes 10
      25/03/25 14:10:58 INFO crail: crail.elasticstore.minnodes 1
      25/03/25 14:10:58 INFO crail: crail.elasticstore.policyrunner.interval 
1000
      25/03/25 14:10:58 INFO crail: crail.elasticstore.logging false
      25/03/25 14:10:58 INFO crail: buffer cache, allocationCount 1, 
bufferCount 256
      25/03/25 14:10:58 INFO crail: crail.storage.rdma.interface enp1s0
      25/03/25 14:10:58 INFO crail: crail.storage.rdma.port 50020
      25/03/25 14:10:58 INFO crail: crail.storage.rdma.storagelimit 268435456
      25/03/25 14:10:58 INFO crail: crail.storage.rdma.allocationsize 268435456
      25/03/25 14:10:58 INFO crail: crail.storage.rdma.datapath 
/dev/hugepages/data
      25/03/25 14:10:58 INFO crail: crail.storage.rdma.localmap true
      25/03/25 14:10:58 INFO crail: crail.storage.rdma.queuesize 32
      25/03/25 14:10:58 INFO crail: crail.storage.rdma.type passive
      25/03/25 14:10:58 INFO crail: crail.storage.rdma.backlog 100
      25/03/25 14:10:58 INFO crail: crail.storage.rdma.connecttimeout 1000
      25/03/25 14:10:58 INFO narpc: new NaRPC client group v1.5.0, queueDepth 
32, messageSize 512, nodealy true
      25/03/25 14:10:58 INFO crail: crail.namenode.tcp.queueDepth 32
      25/03/25 14:10:58 INFO crail: crail.namenode.tcp.messageSize 512
      25/03/25 14:10:58 INFO crail: crail.namenode.tcp.cores 1
      25/03/25 14:10:58 INFO crail: connected to namenode(s) 
/128.131.57.140:9060

Where it then is stuck. Here I noticed that it defaults back to NaRPC instead 
of DaRPC. This is the case for all experiments. Changing anything in the 
crail-site.conf on the client did not achieve anything, also attempting to 
manipulate this to something obviously wrong was just ignored. On the server 
side, this ping went unnoticed.

I then tried the iobenchmark using the default $CRAIL_HOME/bin/crail iobench -t 
write -f /filename -s $((1024*1024)) -k 1024. This resulted in

      25/03/25 14:13:25 INFO crail: creating singleton crail file system
      25/03/25 14:13:25 INFO crail: crail.version 3101
      25/03/25 14:13:25 INFO crail: crail.directorydepth 16
      25/03/25 14:13:25 INFO crail: crail.tokenexpiration 10
      25/03/25 14:13:25 INFO crail: crail.blocksize 1048576
      25/03/25 14:13:25 INFO crail: crail.cachelimit 268435456
      25/03/25 14:13:25 INFO crail: crail.cachepath /dev/hugepages/cache
      25/03/25 14:13:25 INFO crail: crail.user crail
      25/03/25 14:13:25 INFO crail: crail.shadowreplication 1
      25/03/25 14:13:25 INFO crail: crail.debug false
      25/03/25 14:13:25 INFO crail: crail.statistics true
      25/03/25 14:13:25 INFO crail: crail.rpctimeout 1000
      25/03/25 14:13:25 INFO crail: crail.datatimeout 1000
      25/03/25 14:13:25 INFO crail: crail.buffersize 1048576
      25/03/25 14:13:25 INFO crail: crail.slicesize 524288
      25/03/25 14:13:25 INFO crail: crail.singleton true
      25/03/25 14:13:25 INFO crail: crail.regionsize 268435456
      25/03/25 14:13:25 INFO crail: crail.directoryrecord 512
      25/03/25 14:13:25 INFO crail: crail.directoryrandomize true
      25/03/25 14:13:25 INFO crail: crail.cacheimpl 
org.apache.crail.memory.MappedBufferCache
      25/03/25 14:13:25 INFO crail: crail.locationmap
      25/03/25 14:13:25 INFO crail: crail.namenode.address 
crail://128.131.57.140:9060
      25/03/25 14:13:25 INFO crail: crail.namenode.blockselection roundrobin
      25/03/25 14:13:25 INFO crail: crail.namenode.fileblocks 16
      25/03/25 14:13:25 INFO crail: crail.namenode.rpctype 
org.apache.crail.namenode.rpc.tcp.TcpNameNode
      25/03/25 14:13:25 INFO crail: crail.namenode.rpcservice 
org.apache.crail.namenode.NameNodeService
      25/03/25 14:13:25 INFO crail: crail.namenode.log
      25/03/25 14:13:25 INFO crail: crail.storage.types 
org.apache.crail.storage.rdma.RdmaStorageTier
      25/03/25 14:13:25 INFO crail: crail.storage.classes 1
      25/03/25 14:13:25 INFO crail: crail.storage.rootclass 0
      25/03/25 14:13:25 INFO crail: crail.storage.keepalive 2
      25/03/25 14:13:25 INFO crail: crail.elasticstore.scaleup 0.4
      25/03/25 14:13:25 INFO crail: crail.elasticstore.scaledown 0.1
      25/03/25 14:13:25 INFO crail: crail.elasticstore.maxnodes 10
      25/03/25 14:13:25 INFO crail: crail.elasticstore.minnodes 1
      25/03/25 14:13:25 INFO crail: crail.elasticstore.policyrunner.interval 
1000
      25/03/25 14:13:25 INFO crail: crail.elasticstore.logging false
      25/03/25 14:13:25 INFO crail: buffer cache, allocationCount 1, 
bufferCount 256
      25/03/25 14:13:25 INFO crail: crail.storage.rdma.interface enp1s0
      25/03/25 14:13:25 INFO crail: crail.storage.rdma.port 50020
      25/03/25 14:13:25 INFO crail: crail.storage.rdma.storagelimit 268435456
      25/03/25 14:13:25 INFO crail: crail.storage.rdma.allocationsize 268435456
      25/03/25 14:13:25 INFO crail: crail.storage.rdma.datapath 
/dev/hugepages/data
      25/03/25 14:13:25 INFO crail: crail.storage.rdma.localmap true
      25/03/25 14:13:25 INFO crail: crail.storage.rdma.queuesize 32
      25/03/25 14:13:25 INFO crail: crail.storage.rdma.type passive
      25/03/25 14:13:25 INFO crail: crail.storage.rdma.backlog 100
      25/03/25 14:13:25 INFO crail: crail.storage.rdma.connecttimeout 1000
      25/03/25 14:13:25 INFO narpc: new NaRPC client group v1.5.0, queueDepth 
32, messageSize 512, nodealy true
      25/03/25 14:13:25 INFO crail: crail.namenode.tcp.queueDepth 32
      25/03/25 14:13:25 INFO crail: crail.namenode.tcp.messageSize 512
      25/03/25 14:13:25 INFO crail: crail.namenode.tcp.cores 1
      25/03/25 14:13:25 INFO crail: connected to namenode(s) 
/128.131.57.140:9060
      write, filename /filename, size 1048576, loop 1024, storageClass 0, 
locationClass 0, buffered true
      Exception in thread "main" java.io.IOException: Map failed
              at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:938)
              at 
org.apache.crail.memory.MappedBufferCache.allocateRegion(MappedBufferCache.java:94)
              at 
org.apache.crail.memory.BufferCache.allocateBuffer(BufferCache.java:95)
              at 
org.apache.crail.core.CoreDataStore.allocateBuffer(CoreDataStore.java:482)
              at 
org.apache.crail.tools.CrailBenchmark.write(CrailBenchmark.java:85)
              at 
org.apache.crail.tools.CrailBenchmark.main(CrailBenchmark.java:1070)
      Caused by: java.lang.OutOfMemoryError: Map failed
              at sun.nio.ch.FileChannelImpl.map0(Native Method)
              at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:935)
              ... 5 more


Where again it defaults to NaRPC instead of DaRPC. Further, the call went 
unnoticed on the server side, although it states it connected to the namenode.
I then changed the setting on my server in crail-site.conf back to use NaRPC. 
Only removing this aspect of the configuration, it still was able to spin up. I 
tested the fsck experiments again where namenodeDump, getLocations and ping 
worked, however directoryDump, blockStatistics and createDirectory went into 
the same java.lang.OutOfMemoryError: Map failed. The connection attempts were 
recognized on the namenode side.

After that, I tried iobench again, resulting in the same OutOfMemoryError. 
Reducing the size and loop to a bare minimum of $((4*4)) -k 4, I still got the 
same issue.


Next, I set back the whole configurations Ive done in crail-site.conf to use 
TCP instead of RDMA, to check whether or not this might give me some insights. 
For this setup, I got the same issues with fsck -t createDirectory and the 
iobench.

At this point I am absolutely stuck. As Crail is perfectly fitted for my 
masters thesis, I dont really want to give up on trying to finish this setup.

I hope I didnt miss any important information. I will try to find a way to 
debug this in the meantime. I am greatful for any advise on this.

For the last email Ive send, I didnt receive any reply in my email provider. 
Would it be possible to put me and my working email "meiko.prilop-...@ibm.com" 
in CC, just in case?

Thanks in advance!

Best
Meiko Prilop

Reply via email to