Hi, I’m getting what looks to be a configuration error when trying to use the CrailShuffleManager. (spark.shuffle.manager org.apache.spark.shuffle.crail.CrailShuffleManager)
It seems like a basic error, but other things are running okay until I add in the line above in to my spark-defaults.conf File. I have my environment variable for crail home set, as well as for the disni libs using: LD_LIBRARY_PATH=/usr/local/lib $ ls -l /usr/local/lib/ total 156 -rwxr-xr-x 1 root root 947 Jun 18 08:11 libdisni.la lrwxrwxrwx 1 root root 17 Jun 18 08:11 libdisni.so -> libdisni.so.0.0.0 lrwxrwxrwx 1 root root 17 Jun 18 08:11 libdisni.so.0 -> libdisni.so.0.0.0 -rwxr-xr-x 1 root root 149784 Jun 18 08:11 libdisni.so.0.0.0 I also have a environment variable for classpath set: CLASSPATH=/disni/target/*:/jNVMf/target/*:/crail/jars/* Could the classpath veriable be the issue? 19/06/18 15:59:47 DEBUG Client: getting client out of cache: org.apache.hadoop.ipc.Client@7bebcd65 19/06/18 15:59:47 DEBUG PerformanceAdvisory: Both short-circuit local reads and UNIX domain socket are disabled. 19/06/18 15:59:47 DEBUG DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection 19/06/18 15:59:48 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 288.9 KB, free 366.0 MB) 19/06/18 15:59:48 DEBUG BlockManager: Put block broadcast_0 locally took 123 ms 19/06/18 15:59:48 DEBUG BlockManager: Putting block broadcast_0 without replication took 125 ms 19/06/18 15:59:48 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.8 KB, free 366.0 MB) 19/06/18 15:59:48 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on master:34103 (size: 23.8 KB, free: 366.3 MB) 19/06/18 15:59:48 DEBUG BlockManagerMaster: Updated info of block broadcast_0_piece0 19/06/18 15:59:48 DEBUG BlockManager: Told master about block broadcast_0_piece0 19/06/18 15:59:48 DEBUG BlockManager: Put block broadcast_0_piece0 locally took 7 ms 19/06/18 15:59:48 DEBUG BlockManager: Putting block broadcast_0_piece0 without replication took 8 ms 19/06/18 15:59:48 INFO SparkContext: Created broadcast 0 from newAPIHadoopFile at TeraSort.scala:60 19/06/18 15:59:48 DEBUG Client: The ping interval is 60000 ms. 19/06/18 15:59:48 DEBUG Client: Connecting to NameNode-1/192.168.3.7:54310 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to NameNode-1/192.168.3.7:54310 from hduser: starting, having connections 1 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to NameNode-1/192.168.3.7:54310 from hduser sending #0 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to NameNode-1/192.168.3.7:54310 from hduser got value #0 19/06/18 15:59:48 DEBUG ProtobufRpcEngine: Call: getFileInfo took 56ms 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to NameNode-1/192.168.3.7:54310 from hduser sending #1 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to NameNode-1/192.168.3.7:54310 from hduser got value #1 19/06/18 15:59:48 DEBUG ProtobufRpcEngine: Call: getListing took 3ms 19/06/18 15:59:48 DEBUG FileInputFormat: Time taken to get FileStatuses: 142 19/06/18 15:59:48 INFO FileInputFormat: Total input paths to process : 2 19/06/18 15:59:48 DEBUG FileInputFormat: Total # of splits generated by getSplits: 2, TimeTaken: 145 19/06/18 15:59:48 DEBUG FileCommitProtocol: Creating committer org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1; output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false 19/06/18 15:59:48 DEBUG FileCommitProtocol: Using (String, String, Boolean) constructor 19/06/18 15:59:48 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 19/06/18 15:59:48 DEBUG DFSClient: /tmp/data_sort/_temporary/0: masked=rwxr-xr-x 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to NameNode-1/192.168.3.7:54310 from hduser sending #2 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to NameNode-1/192.168.3.7:54310 from hduser got value #2 19/06/18 15:59:48 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms 19/06/18 15:59:48 DEBUG ClosureCleaner: Cleaning lambda: $anonfun$write$1 19/06/18 15:59:48 DEBUG ClosureCleaner: +++ Lambda closure ($anonfun$write$1) is now cleaned +++ 19/06/18 15:59:48 INFO SparkContext: Starting job: runJob at SparkHadoopWriter.scala:78 19/06/18 15:59:48 INFO CrailDispatcher: CrailStore starting version 400 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.deleteonclose false 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.deleteOnStart true 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.preallocate 0 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.writeAhead 0 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.debug false 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.serializer org.apache.spark.serializer.CrailSparkSerializer 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.shuffle.affinity true 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.shuffle.outstanding 1 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.shuffle.storageclass 0 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.broadcast.storageclass 0 Exception in thread "dag-scheduler-event-loop" java.lang.IllegalAccessError: tried to access method org.apache.crail.conf.CrailConfiguration.<init>()V from class org.apache.spark.storage.CrailDispatcher at org.apache.spark.storage.CrailDispatcher.org$apache$spark$storage$CrailDispatcher$$init(CrailDispatcher.scala:119) at org.apache.spark.storage.CrailDispatcher$.get(CrailDispatcher.scala:662) at org.apache.spark.shuffle.crail.CrailShuffleManager.registerShuffle(CrailShuffleManager.scala:52) at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:94) at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:87) at org.apache.spark.rdd.RDD.$anonfun$dependencies$2(RDD.scala:240) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.dependencies(RDD.scala:238) at org.apache.spark.scheduler.DAGScheduler.getShuffleDependencies(DAGScheduler.scala:512) at org.apache.spark.scheduler.DAGScheduler.getOrCreateParentStages(DAGScheduler.scala:461) at org.apache.spark.scheduler.DAGScheduler.createResultStage(DAGScheduler.scala:448) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:962) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2067) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) Regards, David