Dear Sir or Madam, since my first email, I was able to setup Crail natively on my Ubuntu 18.04 machine having a namenode and datanode that recognize eachother instead of trying to use docker. However, now when I try to test my setup using crails inbuild tools on another machine, I get some issues that I wasnt able to resolve myself.
TLDR: All approaches to setting up the name and datanode resulted in the client machine to run into: java.lang.OutOfMemoryError: Map failed On two machines separately, Ive setup Soft-iWARP, using https://github.com/animeshtrivedi/blog/blob/master/post/2019-06-26-siw.md and getting the expected output using ibv_devices. Further, rping is able to establish connection between both machines. I then setup crail following the description of https://crail.readthedocs.io/en/latest/source.html https://crail.readthedocs.io/en/latest/config.html where I setup my crail-site.conf to look like: crail.namenode.address crail://128.131.57.140:9060 crail.namenode.rpctype org.apache.crail.namenode.rpc.darpc.DaRPCNameNode crail.cachepath /dev/hugepages/cache crail.regionsize 268435456 crail.cachelimit 268435456 crail.storage.types org.apache.crail.storage.rdma.RdmaStorageTier crail.storage.rdma.interface enp1s0 crail.storage.rdma.datapath /dev/hugepages/data crail.storage.rdma.storagelimit 268435456 On both machines. Here I drastically reduced the default values on sizing. I changed core-site.xml to hold the address here as well at fs.defaultFS . Further, when checking cat /proc/meminfo I get for the client: MemTotal: 16424160 kB MemFree: 4983380 kB MemAvailable: 9580404 kB Buffers: 975056 kB Cached: 3347588 kB SwapCached: 1920 kB Active: 3856868 kB Inactive: 2377380 kB Active(anon): 1339104 kB Inactive(anon): 552592 kB Active(file): 2517764 kB Inactive(file): 1824788 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 4194300 kB SwapFree: 4176624 kB Dirty: 108 kB Writeback: 0 kB AnonPages: 1909928 kB Mapped: 632464 kB Shmem: 132512 kB Slab: 866876 kB SReclaimable: 594772 kB SUnreclaim: 272104 kB KernelStack: 16768 kB PageTables: 20932 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 10309228 kB Committed_AS: 8433608 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 34816 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 2048 HugePages_Free: 2044 HugePages_Rsvd: 2044 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 728940 kB DirectMap2M: 11853824 kB DirectMap1G: 6291456 kB And on my server: MemTotal: 16424160 kB MemFree: 3771716 kB MemAvailable: 6517888 kB Buffers: 309848 kB Cached: 2576436 kB SwapCached: 0 kB Active: 2398708 kB Inactive: 1465044 kB Active(anon): 977800 kB Inactive(anon): 356 kB Active(file): 1420908 kB Inactive(file): 1464688 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 4194300 kB SwapFree: 4194300 kB Dirty: 76 kB Writeback: 0 kB AnonPages: 977544 kB Mapped: 233700 kB Shmem: 820 kB Slab: 294456 kB SReclaimable: 202184 kB SUnreclaim: 92272 kB KernelStack: 6096 kB PageTables: 12184 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 8212076 kB Committed_AS: 2616856 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 4096 HugePages_Free: 3968 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 210796 kB DirectMap2M: 7129088 kB DirectMap1G: 11534336 kB Checking if my hugetables are mounted using mount | grep huge on the server and client, the output is: hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M) Spinning up a namenode on the server, the output is as follows: 25/03/25 13:57:14 INFO crail: initalizing namenode 25/03/25 13:57:14 INFO crail: crail.version 3101 25/03/25 13:57:14 INFO crail: crail.directorydepth 16 25/03/25 13:57:14 INFO crail: crail.tokenexpiration 10 25/03/25 13:57:14 INFO crail: crail.blocksize 1048576 25/03/25 13:57:14 INFO crail: crail.cachelimit 268435456 25/03/25 13:57:14 INFO crail: crail.cachepath /dev/hugepages/cache 25/03/25 13:57:14 INFO crail: crail.user crail 25/03/25 13:57:14 INFO crail: crail.shadowreplication 1 25/03/25 13:57:14 INFO crail: crail.debug false 25/03/25 13:57:14 INFO crail: crail.statistics true 25/03/25 13:57:14 INFO crail: crail.rpctimeout 1000 25/03/25 13:57:14 INFO crail: crail.datatimeout 1000 25/03/25 13:57:14 INFO crail: crail.buffersize 1048576 25/03/25 13:57:14 INFO crail: crail.slicesize 524288 25/03/25 13:57:14 INFO crail: crail.singleton true 25/03/25 13:57:14 INFO crail: crail.regionsize 268435456 25/03/25 13:57:14 INFO crail: crail.directoryrecord 512 25/03/25 13:57:14 INFO crail: crail.directoryrandomize true 25/03/25 13:57:14 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache 25/03/25 13:57:14 INFO crail: crail.locationmap 25/03/25 13:57:14 INFO crail: crail.namenode.address crail://128.131.57.140:9060?id=0&size=1 25/03/25 13:57:14 INFO crail: crail.namenode.blockselection roundrobin 25/03/25 13:57:14 INFO crail: crail.namenode.fileblocks 16 25/03/25 13:57:14 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.darpc.DaRPCNameNode 25/03/25 13:57:14 INFO crail: crail.namenode.rpcservice org.apache.crail.namenode.NameNodeService 25/03/25 13:57:14 INFO crail: crail.namenode.log 25/03/25 13:57:14 INFO crail: crail.storage.types org.apache.crail.storage.rdma.RdmaStorageTier 25/03/25 13:57:14 INFO crail: crail.storage.classes 1 25/03/25 13:57:14 INFO crail: crail.storage.rootclass 0 25/03/25 13:57:14 INFO crail: crail.storage.keepalive 2 25/03/25 13:57:14 INFO crail: crail.elasticstore.scaleup 0.4 25/03/25 13:57:14 INFO crail: crail.elasticstore.scaledown 0.1 25/03/25 13:57:14 INFO crail: crail.elasticstore.maxnodes 10 25/03/25 13:57:14 INFO crail: crail.elasticstore.minnodes 1 25/03/25 13:57:14 INFO crail: crail.elasticstore.policyrunner.interval 1000 25/03/25 13:57:14 INFO crail: crail.elasticstore.logging false 25/03/25 13:57:14 INFO crail: round robin block selection 25/03/25 13:57:14 INFO crail: rpc group started, recvQueue 32 25/03/25 13:57:14 INFO darpc: running resource management, index 0, affinity 2, timeout 2147483647 25/03/25 13:57:14 INFO crail: crail.namenode.darpc.polling false 25/03/25 13:57:14 INFO crail: crail.namenode.darpc.type passive 25/03/25 13:57:14 INFO crail: crail.namenode.darpc.affinity 1 25/03/25 13:57:14 INFO crail: crail.namenode.darpc.maxinline 0 25/03/25 13:57:14 INFO crail: crail.namenode.darpc.recvQueue 32 25/03/25 13:57:14 INFO crail: crail.namenode.darpc.sendQueue 32 25/03/25 13:57:14 INFO crail: crail.namenode.darpc.pollsize 32 25/03/25 13:57:14 INFO crail: crail.namenode.darpc.clustersize 128 25/03/25 13:57:14 INFO crail: crail.namenode.darpc.backlog 100 25/03/25 13:57:14 INFO crail: crail.namenode.darpc.connecttimeout 1000 25/03/25 13:57:14 INFO crail: opened server at /128.131.57.140:9060 Now spinning up the datanode using $CRAIL_HOME/bin/crail datanode -t org.apache.crail.storage.rdma.RdmaStorageTier: 25/03/25 13:59:05 INFO crail: crail.version 3101 25/03/25 13:59:05 INFO crail: crail.directorydepth 16 25/03/25 13:59:05 INFO crail: crail.tokenexpiration 10 25/03/25 13:59:05 INFO crail: crail.blocksize 1048576 25/03/25 13:59:05 INFO crail: crail.cachelimit 268435456 25/03/25 13:59:05 INFO crail: crail.cachepath /dev/hugepages/cache 25/03/25 13:59:05 INFO crail: crail.user crail 25/03/25 13:59:05 INFO crail: crail.shadowreplication 1 25/03/25 13:59:05 INFO crail: crail.debug false 25/03/25 13:59:05 INFO crail: crail.statistics true 25/03/25 13:59:05 INFO crail: crail.rpctimeout 1000 25/03/25 13:59:05 INFO crail: crail.datatimeout 1000 25/03/25 13:59:05 INFO crail: crail.buffersize 1048576 25/03/25 13:59:05 INFO crail: crail.slicesize 524288 25/03/25 13:59:05 INFO crail: crail.singleton true 25/03/25 13:59:05 INFO crail: crail.regionsize 268435456 25/03/25 13:59:05 INFO crail: crail.directoryrecord 512 25/03/25 13:59:05 INFO crail: crail.directoryrandomize true 25/03/25 13:59:05 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache 25/03/25 13:59:05 INFO crail: crail.locationmap 25/03/25 13:59:05 INFO crail: crail.namenode.address crail://128.131.57.140:9060 25/03/25 13:59:05 INFO crail: crail.namenode.blockselection roundrobin 25/03/25 13:59:05 INFO crail: crail.namenode.fileblocks 16 25/03/25 13:59:05 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.darpc.DaRPCNameNode 25/03/25 13:59:05 INFO crail: crail.namenode.rpcservice org.apache.crail.namenode.NameNodeService 25/03/25 13:59:05 INFO crail: crail.namenode.log 25/03/25 13:59:05 INFO crail: crail.storage.types org.apache.crail.storage.rdma.RdmaStorageTier 25/03/25 13:59:05 INFO crail: crail.storage.classes 1 25/03/25 13:59:05 INFO crail: crail.storage.rootclass 0 25/03/25 13:59:05 INFO crail: crail.storage.keepalive 2 25/03/25 13:59:05 INFO crail: crail.elasticstore.scaleup 0.4 25/03/25 13:59:05 INFO crail: crail.elasticstore.scaledown 0.1 25/03/25 13:59:05 INFO crail: crail.elasticstore.maxnodes 10 25/03/25 13:59:05 INFO crail: crail.elasticstore.minnodes 1 25/03/25 13:59:05 INFO crail: crail.elasticstore.policyrunner.interval 1000 25/03/25 13:59:05 INFO crail: crail.elasticstore.logging false 25/03/25 13:59:05 INFO crail: crail.storage.rdma.interface enp1s0 25/03/25 13:59:05 INFO crail: crail.storage.rdma.port 50020 25/03/25 13:59:05 INFO crail: crail.storage.rdma.storagelimit 268435456 25/03/25 13:59:05 INFO crail: crail.storage.rdma.allocationsize 268435456 25/03/25 13:59:05 INFO crail: crail.storage.rdma.datapath /dev/hugepages/data 25/03/25 13:59:05 INFO crail: crail.storage.rdma.localmap true 25/03/25 13:59:05 INFO crail: crail.storage.rdma.queuesize 32 25/03/25 13:59:05 INFO crail: crail.storage.rdma.type passive 25/03/25 13:59:05 INFO crail: crail.storage.rdma.backlog 100 25/03/25 13:59:05 INFO crail: crail.storage.rdma.connecttimeout 1000 25/03/25 13:59:05 INFO crail: rdma storage server started, address /128.131.57.140:50020, persistent false, maxWR 1, maxSge 1, cqSize 1 25/03/25 13:59:05 INFO crail: rpc group started, recvQueue 32 25/03/25 13:59:05 INFO crail: crail.namenode.darpc.polling false 25/03/25 13:59:05 INFO crail: crail.namenode.darpc.type passive 25/03/25 13:59:05 INFO crail: crail.namenode.darpc.affinity 1 25/03/25 13:59:05 INFO crail: crail.namenode.darpc.maxinline 0 25/03/25 13:59:05 INFO crail: crail.namenode.darpc.recvQueue 32 25/03/25 13:59:05 INFO crail: crail.namenode.darpc.sendQueue 32 25/03/25 13:59:05 INFO crail: crail.namenode.darpc.pollsize 32 25/03/25 13:59:05 INFO crail: crail.namenode.darpc.clustersize 128 25/03/25 13:59:05 INFO crail: crail.namenode.darpc.backlog 100 25/03/25 13:59:05 INFO crail: crail.namenode.darpc.connecttimeout 1000 25/03/25 13:59:06 INFO crail: connected to namenode(s) /128.131.57.140:9060 25/03/25 13:59:06 INFO crail: datanode statistics, freeBlocks 256 25/03/25 13:59:06 INFO crail: datanode statistics, freeBlocks 256 Finally, on my client I try to test using the inbuild functions of iobench and fsck and further I have setup a disni client following the examples and the configuration found in https://github.com/brianfrankcooper/YCSB/tree/master/crail/src/main/java/site/ycsb/db/crail The first attempt of connecting to the namenode is the experiment of fsck ping. With $CRAIL_HOME/bin/crail fsck -t ping I get: 25/03/25 14:10:58 INFO crail: crail.version 3101 25/03/25 14:10:58 INFO crail: crail.directorydepth 16 25/03/25 14:10:58 INFO crail: crail.tokenexpiration 10 25/03/25 14:10:58 INFO crail: crail.blocksize 1048576 25/03/25 14:10:58 INFO crail: crail.cachelimit 268435456 25/03/25 14:10:58 INFO crail: crail.cachepath /dev/hugepages/cache 25/03/25 14:10:58 INFO crail: crail.user crail 25/03/25 14:10:58 INFO crail: crail.shadowreplication 1 25/03/25 14:10:58 INFO crail: crail.debug false 25/03/25 14:10:58 INFO crail: crail.statistics true 25/03/25 14:10:58 INFO crail: crail.rpctimeout 1000 25/03/25 14:10:58 INFO crail: crail.datatimeout 1000 25/03/25 14:10:58 INFO crail: crail.buffersize 1048576 25/03/25 14:10:58 INFO crail: crail.slicesize 524288 25/03/25 14:10:58 INFO crail: crail.singleton true 25/03/25 14:10:58 INFO crail: crail.regionsize 268435456 25/03/25 14:10:58 INFO crail: crail.directoryrecord 512 25/03/25 14:10:58 INFO crail: crail.directoryrandomize true 25/03/25 14:10:58 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache 25/03/25 14:10:58 INFO crail: crail.locationmap 25/03/25 14:10:58 INFO crail: crail.namenode.address crail://128.131.57.140:9060 25/03/25 14:10:58 INFO crail: crail.namenode.blockselection roundrobin 25/03/25 14:10:58 INFO crail: crail.namenode.fileblocks 16 25/03/25 14:10:58 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode 25/03/25 14:10:58 INFO crail: crail.namenode.rpcservice org.apache.crail.namenode.NameNodeService 25/03/25 14:10:58 INFO crail: crail.namenode.log 25/03/25 14:10:58 INFO crail: crail.storage.types org.apache.crail.storage.rdma.RdmaStorageTier 25/03/25 14:10:58 INFO crail: crail.storage.classes 1 25/03/25 14:10:58 INFO crail: crail.storage.rootclass 0 25/03/25 14:10:58 INFO crail: crail.storage.keepalive 2 25/03/25 14:10:58 INFO crail: crail.elasticstore.scaleup 0.4 25/03/25 14:10:58 INFO crail: crail.elasticstore.scaledown 0.1 25/03/25 14:10:58 INFO crail: crail.elasticstore.maxnodes 10 25/03/25 14:10:58 INFO crail: crail.elasticstore.minnodes 1 25/03/25 14:10:58 INFO crail: crail.elasticstore.policyrunner.interval 1000 25/03/25 14:10:58 INFO crail: crail.elasticstore.logging false 25/03/25 14:10:58 INFO crail: buffer cache, allocationCount 1, bufferCount 256 25/03/25 14:10:58 INFO crail: crail.storage.rdma.interface enp1s0 25/03/25 14:10:58 INFO crail: crail.storage.rdma.port 50020 25/03/25 14:10:58 INFO crail: crail.storage.rdma.storagelimit 268435456 25/03/25 14:10:58 INFO crail: crail.storage.rdma.allocationsize 268435456 25/03/25 14:10:58 INFO crail: crail.storage.rdma.datapath /dev/hugepages/data 25/03/25 14:10:58 INFO crail: crail.storage.rdma.localmap true 25/03/25 14:10:58 INFO crail: crail.storage.rdma.queuesize 32 25/03/25 14:10:58 INFO crail: crail.storage.rdma.type passive 25/03/25 14:10:58 INFO crail: crail.storage.rdma.backlog 100 25/03/25 14:10:58 INFO crail: crail.storage.rdma.connecttimeout 1000 25/03/25 14:10:58 INFO narpc: new NaRPC client group v1.5.0, queueDepth 32, messageSize 512, nodealy true 25/03/25 14:10:58 INFO crail: crail.namenode.tcp.queueDepth 32 25/03/25 14:10:58 INFO crail: crail.namenode.tcp.messageSize 512 25/03/25 14:10:58 INFO crail: crail.namenode.tcp.cores 1 25/03/25 14:10:58 INFO crail: connected to namenode(s) /128.131.57.140:9060 Where it then is stuck. Here I noticed that it defaults back to NaRPC instead of DaRPC. This is the case for all experiments. Changing anything in the crail-site.conf on the client did not achieve anything, also attempting to manipulate this to something obviously wrong was just ignored. On the server side, this ping went unnoticed. I then tried the iobenchmark using the default $CRAIL_HOME/bin/crail iobench -t write -f /filename -s $((1024*1024)) -k 1024. This resulted in 25/03/25 14:13:25 INFO crail: creating singleton crail file system 25/03/25 14:13:25 INFO crail: crail.version 3101 25/03/25 14:13:25 INFO crail: crail.directorydepth 16 25/03/25 14:13:25 INFO crail: crail.tokenexpiration 10 25/03/25 14:13:25 INFO crail: crail.blocksize 1048576 25/03/25 14:13:25 INFO crail: crail.cachelimit 268435456 25/03/25 14:13:25 INFO crail: crail.cachepath /dev/hugepages/cache 25/03/25 14:13:25 INFO crail: crail.user crail 25/03/25 14:13:25 INFO crail: crail.shadowreplication 1 25/03/25 14:13:25 INFO crail: crail.debug false 25/03/25 14:13:25 INFO crail: crail.statistics true 25/03/25 14:13:25 INFO crail: crail.rpctimeout 1000 25/03/25 14:13:25 INFO crail: crail.datatimeout 1000 25/03/25 14:13:25 INFO crail: crail.buffersize 1048576 25/03/25 14:13:25 INFO crail: crail.slicesize 524288 25/03/25 14:13:25 INFO crail: crail.singleton true 25/03/25 14:13:25 INFO crail: crail.regionsize 268435456 25/03/25 14:13:25 INFO crail: crail.directoryrecord 512 25/03/25 14:13:25 INFO crail: crail.directoryrandomize true 25/03/25 14:13:25 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache 25/03/25 14:13:25 INFO crail: crail.locationmap 25/03/25 14:13:25 INFO crail: crail.namenode.address crail://128.131.57.140:9060 25/03/25 14:13:25 INFO crail: crail.namenode.blockselection roundrobin 25/03/25 14:13:25 INFO crail: crail.namenode.fileblocks 16 25/03/25 14:13:25 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode 25/03/25 14:13:25 INFO crail: crail.namenode.rpcservice org.apache.crail.namenode.NameNodeService 25/03/25 14:13:25 INFO crail: crail.namenode.log 25/03/25 14:13:25 INFO crail: crail.storage.types org.apache.crail.storage.rdma.RdmaStorageTier 25/03/25 14:13:25 INFO crail: crail.storage.classes 1 25/03/25 14:13:25 INFO crail: crail.storage.rootclass 0 25/03/25 14:13:25 INFO crail: crail.storage.keepalive 2 25/03/25 14:13:25 INFO crail: crail.elasticstore.scaleup 0.4 25/03/25 14:13:25 INFO crail: crail.elasticstore.scaledown 0.1 25/03/25 14:13:25 INFO crail: crail.elasticstore.maxnodes 10 25/03/25 14:13:25 INFO crail: crail.elasticstore.minnodes 1 25/03/25 14:13:25 INFO crail: crail.elasticstore.policyrunner.interval 1000 25/03/25 14:13:25 INFO crail: crail.elasticstore.logging false 25/03/25 14:13:25 INFO crail: buffer cache, allocationCount 1, bufferCount 256 25/03/25 14:13:25 INFO crail: crail.storage.rdma.interface enp1s0 25/03/25 14:13:25 INFO crail: crail.storage.rdma.port 50020 25/03/25 14:13:25 INFO crail: crail.storage.rdma.storagelimit 268435456 25/03/25 14:13:25 INFO crail: crail.storage.rdma.allocationsize 268435456 25/03/25 14:13:25 INFO crail: crail.storage.rdma.datapath /dev/hugepages/data 25/03/25 14:13:25 INFO crail: crail.storage.rdma.localmap true 25/03/25 14:13:25 INFO crail: crail.storage.rdma.queuesize 32 25/03/25 14:13:25 INFO crail: crail.storage.rdma.type passive 25/03/25 14:13:25 INFO crail: crail.storage.rdma.backlog 100 25/03/25 14:13:25 INFO crail: crail.storage.rdma.connecttimeout 1000 25/03/25 14:13:25 INFO narpc: new NaRPC client group v1.5.0, queueDepth 32, messageSize 512, nodealy true 25/03/25 14:13:25 INFO crail: crail.namenode.tcp.queueDepth 32 25/03/25 14:13:25 INFO crail: crail.namenode.tcp.messageSize 512 25/03/25 14:13:25 INFO crail: crail.namenode.tcp.cores 1 25/03/25 14:13:25 INFO crail: connected to namenode(s) /128.131.57.140:9060 write, filename /filename, size 1048576, loop 1024, storageClass 0, locationClass 0, buffered true Exception in thread "main" java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:938) at org.apache.crail.memory.MappedBufferCache.allocateRegion(MappedBufferCache.java:94) at org.apache.crail.memory.BufferCache.allocateBuffer(BufferCache.java:95) at org.apache.crail.core.CoreDataStore.allocateBuffer(CoreDataStore.java:482) at org.apache.crail.tools.CrailBenchmark.write(CrailBenchmark.java:85) at org.apache.crail.tools.CrailBenchmark.main(CrailBenchmark.java:1070) Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:935) ... 5 more Where again it defaults to NaRPC instead of DaRPC. Further, the call went unnoticed on the server side, although it states it connected to the namenode. I then changed the setting on my server in crail-site.conf back to use NaRPC. Only removing this aspect of the configuration, it still was able to spin up. I tested the fsck experiments again where namenodeDump, getLocations and ping worked, however directoryDump, blockStatistics and createDirectory went into the same java.lang.OutOfMemoryError: Map failed. The connection attempts were recognized on the namenode side. After that, I tried iobench again, resulting in the same OutOfMemoryError. Reducing the size and loop to a bare minimum of $((4*4)) -k 4, I still got the same issue. Next, I set back the whole configurations Ive done in crail-site.conf to use TCP instead of RDMA, to check whether or not this might give me some insights. For this setup, I got the same issues with fsck -t createDirectory and the iobench. At this point I am absolutely stuck. As Crail is perfectly fitted for my masters thesis, I dont really want to give up on trying to finish this setup. I hope I didnt miss any important information. I will try to find a way to debug this in the meantime. I am greatful for any advise on this. For the last email Ive send, I didnt receive any reply in my email provider. Would it be possible to put me and my working email "meiko.prilop-...@ibm.com" in CC, just in case? Thanks in advance! Best Meiko Prilop