Oh, and while I’m thinking about it Jonas, when I added the patches you provided the other day, I only
added them to the spark containers (clients) not to my crail containers running on my storage server. Should the patches been added to all of the containers? Regards, David ________________________________ From: Jonas Pfefferle <peppe...@japf.ch> Sent: Friday, June 28, 2019 12:54:27 AM To: dev@crail.apache.org; David Crespi Subject: Re: Setting up storage class 1 and 2 Hi David, At the moment, it is possible to add a NVMf datanode even if only the RDMA storage type is specified in the config. As you have seen this will go wrong as soon as a client tries to connect to the datanode. Make sure to start the RDMA datanode with the appropriate classname, see: https://incubator-crail.readthedocs.io/en/latest/run.html The correct classname is org.apache.crail.storage.rdma.RdmaStorageTier. Regards, Jonas On Thu, 27 Jun 2019 23:09:26 +0000 David Crespi <david.cre...@storedgesystems.com> wrote: > Hi, > I’m trying to integrate the storage classes and I’m hitting another >issue when running terasort and just > using the crail-shuffle with HDFS as the tmp storage. The program >just sits, after the following > message: > 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection >to NameNode-1/192.168.3.7:54310 from hduser: closed > 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection >to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining >connections 0 > > During this run, I’ve removed the two crail nvmf (class 1 and 2) >containers from the server, and I’m only running > the namenode and a rdma storage class 1 datanode. My spark >configuration is also now only looking at > the rdma class. It looks as though it’s picking up the NVMf IP and >port in the INFO messages seen below. > I must be configuring something wrong, but I’ve not been able to >track it down. Any thoughts? > > > ************************************ > TeraSort > ************************************ > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in >[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in >[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in >[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in >[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load >native-hadoop library for your platform... using builtin-java classes >where applicable > 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2 > 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort > 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to: >hduser > 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to: >hduser > 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups >to: > 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups >to: > 19/06/27 15:59:07 INFO SecurityManager: SecurityManager: >authentication disabled; ui acls disabled; users with view >permissions: Set(hduser); groups with view permissions: Set(); users > with modify permissions: Set(hduser); groups with modify >permissions: Set() > 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the >default logging framework > 19/06/27 15:59:08 DEBUG InternalThreadLocalMap: >-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024 > 19/06/27 15:59:08 DEBUG InternalThreadLocalMap: >-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096 > 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup: >-Dio.netty.eventLoopThreads: 112 > 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe: >false > 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8 > 19/06/27 15:59:08 DEBUG PlatformDependent0: >sun.misc.Unsafe.theUnsafe: available > 19/06/27 15:59:08 DEBUG PlatformDependent0: >sun.misc.Unsafe.copyMemory: available > 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address: >available > 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer >constructor: available > 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned: >available, true > 19/06/27 15:59:08 DEBUG PlatformDependent0: >jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable >prior to Java9 > 19/06/27 15:59:08 DEBUG PlatformDependent0: >java.nio.DirectByteBuffer.<init>(long, int): available > 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe: >available > 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp >(java.io.tmpdir) > 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64 >(sun.arch.data.model) > 19/06/27 15:59:08 DEBUG PlatformDependent: >-Dio.netty.noPreferDirect: false > 19/06/27 15:59:08 DEBUG PlatformDependent: >-Dio.netty.maxDirectMemory: 1029177344 bytes > 19/06/27 15:59:08 DEBUG PlatformDependent: >-Dio.netty.uninitializedArrayAllocationThreshold: -1 > 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner(): >available > 19/06/27 15:59:08 DEBUG NioEventLoop: >-Dio.netty.noKeySetOptimization: false > 19/06/27 15:59:08 DEBUG NioEventLoop: >-Dio.netty.selectorAutoRebuildThreshold: 512 > 19/06/27 15:59:08 DEBUG PlatformDependent: >org.jctools-core.MpscChunkedArrayQueue: available > 19/06/27 15:59:08 DEBUG ResourceLeakDetector: >-Dio.netty.leakDetection.level: simple > 19/06/27 15:59:08 DEBUG ResourceLeakDetector: >-Dio.netty.leakDetection.targetRecords: 4 > 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: >-Dio.netty.allocator.numHeapArenas: 9 > 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: >-Dio.netty.allocator.numDirectArenas: 10 > 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: >-Dio.netty.allocator.pageSize: 8192 > 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: >-Dio.netty.allocator.maxOrder: 11 > 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: >-Dio.netty.allocator.chunkSize: 16777216 > 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: >-Dio.netty.allocator.tinyCacheSize: 512 > 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: >-Dio.netty.allocator.smallCacheSize: 256 > 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: >-Dio.netty.allocator.normalCacheSize: 64 > 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: >-Dio.netty.allocator.maxCachedBufferCapacity: 32768 > 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: >-Dio.netty.allocator.cacheTrimInterval: 8192 > 19/06/27 15:59:08 DEBUG PooledByteBufAllocator: >-Dio.netty.allocator.useCacheForAllThreads: true > 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236 >(auto-detected) > 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false > 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses: >false > 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo, >127.0.0.1) > 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128 > 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId: >02:42:ac:ff:fe:1b:00:02 (auto-detected) > 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type: >pooled > 19/06/27 15:59:08 DEBUG ByteBufUtil: >-Dio.netty.threadLocalDirectBufferSize: 65536 > 19/06/27 15:59:08 DEBUG ByteBufUtil: >-Dio.netty.maxThreadLocalCharBufferSize: 16384 > 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on >port: 36915 > 19/06/27 15:59:08 INFO Utils: Successfully started service >'sparkDriver' on port 36915. > 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class >org.apache.spark.serializer.KryoSerializer > 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker > 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init > 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started > 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster > 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using >org.apache.spark.storage.DefaultTopologyMapper for getting topology >information > 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: >BlockManagerMasterEndpoint up > 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at >/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5 > 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook > 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with >capacity 366.3 MB > 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator > 19/06/27 15:59:08 DEBUG >OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init > 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui: >SSLOptions{enabled=false, port=None, keyStore=None, >keyStorePassword=None, trustStore=None, trustStorePassword=None, >protocol=None, enabledAlgorithms=Set()} > 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI' >on port 4040. > 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and >started at http://192.168.1.161:4040 > 19/06/27 15:59:08 INFO SparkContext: Added JAR >file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar >at >spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar >with timestamp 1561676348562 > 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: >Connecting to master spark://master:7077... > 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new >connection to master/192.168.3.13:7077 > 19/06/27 15:59:08 DEBUG AbstractByteBuf: >-Dio.netty.buffer.bytebuf.checkAccessible: true > 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default >ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2 > 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to >master/192.168.3.13:7077 successful, running bootstraps... > 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created >connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in >bootstraps) > 19/06/27 15:59:08 DEBUG Recycler: >-Dio.netty.recycler.maxCapacityPerThread: 32768 > 19/06/27 15:59:08 DEBUG Recycler: >-Dio.netty.recycler.maxSharedCapacityFactor: 2 > 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity: >16 > 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8 > 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to >Spark cluster with app ID app-20190627155908-0005 > 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor >added: app-20190627155908-0005/0 on >worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2 >core(s) > 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor >ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2 >core(s), 1024.0 MB RAM > 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor >added: app-20190627155908-0005/1 on >worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2 >core(s) > 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor >ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2 >core(s), 1024.0 MB RAM > 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on >port: 39189 > 19/06/27 15:59:08 INFO Utils: Successfully started service >'org.apache.spark.network.netty.NettyBlockTransferService' on port >39189. > 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor >added: app-20190627155908-0005/2 on >worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2 >core(s) > 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on >master:39189 > 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor >ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2 >core(s), 1024.0 MB RAM > 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor >added: app-20190627155908-0005/3 on >worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2 >core(s) > 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor >ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2 >core(s), 1024.0 MB RAM > 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor >added: app-20190627155908-0005/4 on >worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2 >core(s) > 19/06/27 15:59:08 INFO BlockManager: Using >org.apache.spark.storage.RandomBlockReplicationPolicy for block >replication policy > 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor >ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2 >core(s), 1024.0 MB RAM > 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor >updated: app-20190627155908-0005/0 is now RUNNING > 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor >updated: app-20190627155908-0005/3 is now RUNNING > 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor >updated: app-20190627155908-0005/4 is now RUNNING > 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor >updated: app-20190627155908-0005/1 is now RUNNING > 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor >updated: app-20190627155908-0005/2 is now RUNNING > 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager >BlockManagerId(driver, master, 39189, None) > 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for >master > 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block >manager master:39189 with 366.3 MB RAM, BlockManagerId(driver, >master, 39189, None) > 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager >BlockManagerId(driver, master, 39189, None) > 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager: >BlockManagerId(driver, master, 39189, None) > 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend >is ready for scheduling beginning after reached >minRegisteredResourcesRatio: 0.0 > 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook > 19/06/27 15:59:09 DEBUG BlockReaderLocal: >dfs.client.use.legacy.blockreader.local = false > 19/06/27 15:59:09 DEBUG BlockReaderLocal: >dfs.client.read.shortcircuit = false > 19/06/27 15:59:09 DEBUG BlockReaderLocal: >dfs.client.domain.socket.data.traffic = false > 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path = > 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null > 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER, >rpcRequestWrapperClass=class >org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, >rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0 > 19/06/27 15:59:09 DEBUG Client: getting client out of cache: >org.apache.hadoop.ipc.Client@3ed03652 > 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit >local reads and UNIX domain socket are disabled. > 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol >not using SaslPropertiesResolver, no QOP found in configuration for >dfs.data.transfer.protection > 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as >values in memory (estimated size 288.9 KB, free 366.0 MB) > 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally >took 115 ms > 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0 >without replication took 117 ms > 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored >as bytes in memory (estimated size 23.8 KB, free 366.0 MB) > 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in >memory on master:39189 (size: 23.8 KB, free: 366.3 MB) > 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block >broadcast_0_piece0 > 19/06/27 15:59:10 DEBUG BlockManager: Told master about block >broadcast_0_piece0 > 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0 >locally took 6 ms > 19/06/27 15:59:10 DEBUG BlockManager: Putting block >broadcast_0_piece0 without replication took 6 ms > 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from >newAPIHadoopFile at TeraSort.scala:60 > 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms. > 19/06/27 15:59:10 DEBUG Client: Connecting to >NameNode-1/192.168.3.7:54310 > 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection >to NameNode-1/192.168.3.7:54310 from hduser: starting, having >connections 1 > 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection >to NameNode-1/192.168.3.7:54310 from hduser sending #0 > 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection >to NameNode-1/192.168.3.7:54310 from hduser got value #0 > 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took >31ms > 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection >to NameNode-1/192.168.3.7:54310 from hduser sending #1 > 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection >to NameNode-1/192.168.3.7:54310 from hduser got value #1 > 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms > 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get >FileStatuses: 134 > 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process >: 2 > 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated >by getSplits: 2, TimeTaken: 139 > 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer >org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1; >output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false > 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String, >Boolean) constructor > 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer >Algorithm version is 1 > 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0: >masked=rwxr-xr-x > 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection >to NameNode-1/192.168.3.7:54310 from hduser sending #2 > 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection >to NameNode-1/192.168.3.7:54310 from hduser got value #2 > 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms > 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda: >$anonfun$write$1 > 19/06/27 15:59:10 DEBUG ClosureCleaner: +++ Lambda closure >($anonfun$write$1) is now cleaned +++ > 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at >SparkHadoopWriter.scala:78 > 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version >400 > 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose >false > 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart >true > 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0 > 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0 > 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false > 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer >org.apache.spark.serializer.CrailSparkSerializer > 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity >true > 19/06/27 15:59:10 INFO CrailDispatcher: >spark.crail.shuffle.outstanding 1 > 19/06/27 15:59:10 INFO CrailDispatcher: >spark.crail.shuffle.storageclass 0 > 19/06/27 15:59:10 INFO CrailDispatcher: >spark.crail.broadcast.storageclass 0 > 19/06/27 15:59:10 INFO crail: creating singleton crail file system > 19/06/27 15:59:10 INFO crail: crail.version 3101 > 19/06/27 15:59:10 INFO crail: crail.directorydepth 16 > 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10 > 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576 > 19/06/27 15:59:10 INFO crail: crail.cachelimit 0 > 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache > 19/06/27 15:59:10 INFO crail: crail.user crail > 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1 > 19/06/27 15:59:10 INFO crail: crail.debug true > 19/06/27 15:59:10 INFO crail: crail.statistics true > 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000 > 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000 > 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576 > 19/06/27 15:59:10 INFO crail: crail.slicesize 65536 > 19/06/27 15:59:10 INFO crail: crail.singleton true > 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824 > 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512 > 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true > 19/06/27 15:59:10 INFO crail: crail.cacheimpl >org.apache.crail.memory.MappedBufferCache > 19/06/27 15:59:10 INFO crail: crail.locationmap > 19/06/27 15:59:10 INFO crail: crail.namenode.address >crail://192.168.1.164:9060 > 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection >roundrobin > 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16 > 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype >org.apache.crail.namenode.rpc.tcp.TcpNameNode > 19/06/27 15:59:10 INFO crail: crail.namenode.log > 19/06/27 15:59:10 INFO crail: crail.storage.types >org.apache.crail.storage.rdma.RdmaStorageTier > 19/06/27 15:59:10 INFO crail: crail.storage.classes 1 > 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0 > 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2 > 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0, >bufferCount 1024 > 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0 > 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020 > 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit >4294967296 > 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize >1073741824 > 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath >/dev/hugepages/rdma > 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true > 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32 > 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive > 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100 > 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000 > 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0, >queueDepth 32, messageSize 512, nodealy true > 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32 > 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512 > 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1 > 19/06/27 15:59:10 INFO crail: connected to namenode(s) >/192.168.1.164:9060 > 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark > 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark > 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark > 19/06/27 15:59:10 INFO crail: createNode: name /spark, type >DIRECTORY, storageAffinity 0, locationAffinity 0 > 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0, >streamId 1, isDir true, writeHint 0 > 19/06/27 15:59:10 INFO crail: passive data client > 19/06/27 15:59:10 INFO disni: creating RdmaProvider of type 'nat' > 19/06/27 15:59:10 INFO disni: jverbs jni version 32 > 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs >size 28, native size 16 > 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32, >native size 32 > 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size >72, native size 128 > 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48, >native size 48 > 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16, >native size 16 > 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size >40, native size 40 > 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48, >native size 48 > 19/06/27 15:59:10 INFO disni: createEventChannel, objId >139811924587312 > 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32, >maxSge 4, cqSize 64 > 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0 > 19/06/27 15:59:10 INFO disni: createId, id 139811924676432 > 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0 > 19/06/27 15:59:10 INFO disni: resolveAddr, addres >/192.168.3.100:4420 > 19/06/27 15:59:10 INFO disni: resolveRoute, id 0 > 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808 > 19/06/27 15:59:10 INFO disni: setting up protection domain, context >467, pd 1 > 19/06/27 15:59:10 INFO disni: setting up cq processor > 19/06/27 15:59:10 INFO disni: new endpoint CQ processor > 19/06/27 15:59:10 INFO disni: createCompChannel, context >139810647883744 > 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe >64 > 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192, >send_wr size 32, recv_wr_size 32 > 19/06/27 15:59:10 INFO disni: connect, id 0 > 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress >/192.168.3.13:43273, dstAddress /192.168.3.100:4420 > 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: >Registered executor NettyRpcEndpointRef(spark-client://Executor) >(192.168.3.11:35854) with ID 0 > 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: >Registered executor NettyRpcEndpointRef(spark-client://Executor) >(192.168.3.12:44312) with ID 1 > 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: >Registered executor NettyRpcEndpointRef(spark-client://Executor) >(192.168.3.8:34774) with ID 4 > 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: >Registered executor NettyRpcEndpointRef(spark-client://Executor) >(192.168.3.9:58808) with ID 2 > 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for >192.168.3.11 > 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block >manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0, >192.168.3.11, 41919, None) > 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for >192.168.3.12 > 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block >manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1, >192.168.3.12, 46697, None) > 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for >192.168.3.8 > 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block >manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4, >192.168.3.8, 37281, None) > 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for >192.168.3.9 > 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block >manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2, >192.168.3.9, 43857, None) > 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: >Registered executor NettyRpcEndpointRef(spark-client://Executor) >(192.168.3.10:40100) with ID 3 > 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for >192.168.3.10 > 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block >manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3, >192.168.3.10, 38527, None) > 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection >to NameNode-1/192.168.3.7:54310 from hduser: closed > 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection >to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining >connections 0 > > > Regards, > > David >