hi We have integrated spark with yarn cluster. To test for long running cluster, we ran below script to launch application, an example programme as given at bottom.
#Script which runs example infinitely /while true do echo "start $times" /opt/ficlient/Spark/spark/bin/spark-submit --class com.example.cfgtest.SparkPi --master yarn-client --driver-java-options '-Dlog4j.configuration=file:"./log4j.properties" -Dzookeeper.server.principal=zookeeper/hadoop.hadoop.com --executor-memory 1G --num-executors 3 --driver-memory 1G --executor-cores 5 --queue QueueD /opt/example/example.jar 100 echo "finish $times" let ++times done/ After running for 1 or 2 days our jvm crashes and we get an error like *# # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (sharedRuntime.cpp:834), pid=1325, tid=0x00007f599f312700 # fatal error: exception happened outside interpreter, nmethods and vtable stubs at pc 0x00007f59c36b16b1 # # JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build 1.8.0_131-b11) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops) # Core dump written. Default location: /opt/ashok/crash_test/core or core.1325 (max size 1 kB). To ensure a full core dump, try "ulimit -c unlimited" before starting Java again # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # --------------- T H R E A D --------------- Current thread (0x00007f59ac0f4000): JavaThread "SparkListenerBus" daemon [_thread_in_Java, id=7566, stack(0x00007f599f212000,0x00007f599f313000)] Stack: [0x00007f599f212000,0x00007f599f313000], sp=0x00007f599f30fb10, free space=1014k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xac826a] VMError::report_and_die()+0x2ba V [libjvm.so+0x4fd089] report_fatal(char const*, int, char const*)+0x59 V [libjvm.so+0x9c391a] SharedRuntime::continuation_for_implicit_exception(JavaThread*, unsigned char*, SharedRuntime::ImplicitExceptionKind)+0x33a V [libjvm.so+0x92bbfa] JVM_handle_linux_signal+0x48a V [libjvm.so+0x921e13] signalHandler(int, siginfo*, void*)+0x43 C [libpthread.so.0+0xf850] j org.apache.spark.serializer.KryoSerializerInstance.borrowKryo()Lcom/esotericsoftware/kryo/Kryo;+11 * This issue we have got many times and it was not always org.apache.spark.serializer.KryoSerializerInstance, the current thread where jvm got crashed. Linux environment : suse 11.4 Java version :jdk1.8.0_131 Below is example we have tried. /import org.apache.commons.logging.LogFactory import scala.math.random import org.apache.spark._ /** Computes an approximation to pi */ object SparkPi { val LOG = LogFactory.getLog("SparkPi") def main(args: Array[String]) { System.setProperty("spark.serializer", "org.apache.spark.serializer.KryoSerializer") val conf = new SparkConf().setAppName("Spark Pi") val spark = new SparkContext(conf) val slices = if (args.length > 0) args(0).toInt else 2 val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow val rdd = spark.parallelize(1 until n, slices) val shuffleRdd = rdd.repartition(200) val count = shuffleRdd.map { i => val x = random * 2 - 1 val y = random * 2 - 1 if (x*x + y*y < 1) 1 else 0 }.reduce(_ + _) println("Pi is roughly " + 4.0 * count / n) spark.stop() } }/ Has anybody faced this issue. Any suggestion to resolve it. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org