Hi Eugene, Nice catch! This looks like a side effect of ZEPPELIN-28. spark.cores.max doesn't have a default value itself but it follows spark.deploy.defaultCores whose initial value is Int.MaxValue. I think It should be fixed. Can you make a JIRA ticket and PR for github of this issue?
Regards, Jongyoul Lee On Wed, Apr 15, 2015 at 8:39 AM, Eugene Morozov <[email protected]> wrote: > Hi! > > Yesterday, I had an issue with running Zeppelin on Spark 1.2.1, but I > wrote to user list of this project. The issue is described in details in my > previous email. Today, I’ve debugged it a little and I seems to find a > weird place, where I believe the issue might be. > > That is a line 38 of SparkDeploySchedulerBackend: val maxCores = > conf.getOption("spark.cores.max").map(_.toInt) > > I wrote a simple program to prove that NumberFormatException comes from > exactly this line: > val conf: SparkConf = new SparkConf() > conf.set("spark.cores.max","") > conf.getOption("spark.cores.max").map(_.toInt) > > These three lines gives NumberFormatException. > So, this spark.cores.max comes from two places: > 1. static initialiser of SparkInterpreter adds spark.cores.max with empty > string value. It was null, but has been changed. > 2. from file conf/interpreter.json, where spark.cores.max is also a empty > string value. > > Description of this field states that if it’s blank, then it should become > number of cores in the system, but that doesn’t happen for me for some > reason. When I’ve changed it in conf/interpreter.json Zeppelin started > worked as it should. > > I believe it should be number of cores in static initialiser of > SparkInterpreter instead of empty string value, but not sure. > - Is there a better fix. > - What’s the best approach to run it under debug? Right now I’m using > remote debug, but interpreter starts in particular time and it’s hard to > start debugging in the right, there should be a way simpler than that. May > be there are some resources for developers how to configure local > environment? > > On 14 Apr 2015, at 02:14, Eugene Morozov <[email protected]> wrote: > > > Hi! > > > > I’m trying to run at least something using my spark / cassandra setup: > prebuilt spark 1.2.1-hadoop1, cassandra 2.0.14 > > Spark by itself is working fine, I have several tests in my project - > they’re working, I’m able to use spark shell to run my project - everything > is fine. > > > > So, the simplest thing I’m trying now is run zeppelin, then creating a > paragraph: > > val rdd = > sc.textFile("/Users/emorozov/tools/apache-cassandra-2.0.14/conf/cassandra.yaml") > > rdd.count > > > > Here is what I see in the log files: > > > > Zeppelin Interpreter log file: > > INFO [2015-04-14 01:29:21,097] ({pool-1-thread-5} > Logging.scala[logInfo]:59) - Successfully started service 'SparkUI' on port > 4045. > > INFO [2015-04-14 01:29:21,097] ({pool-1-thread-5} > Logging.scala[logInfo]:59) - Started SparkUI at http://10.59.26.123:4045 > > INFO [2015-04-14 01:29:21,255] ({pool-1-thread-5} > Logging.scala[logInfo]:59) - Added JAR > file:/Users/emorozov/dev/analytics/analytics-jobs/target/analytics-jobs-5.2.0-SNAPSHOT-all.jar > at http://10.59.26.123:53658/jars/analytics-jobs-5.2.0-SNAPSHOT-all.jar > with timestamp 1429000161255 > > ERROR [2015-04-14 01:29:21,256] ({pool-1-thread-5} > ProcessFunction.java[process]:41) - Internal error processing getProgress > > org.apache.zeppelin.interpreter.InterpreterException: > java.lang.NumberFormatException: For input string: "" > > at > org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:75) > > at > org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) > > at > org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109) > > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:299) > > at > org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:938) > > at > org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:923) > > at > org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.lang.NumberFormatException: For input string: "" > > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > > at java.lang.Integer.parseInt(Integer.java:504) > > at java.lang.Integer.parseInt(Integer.java:527) > > at > scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) > > at scala.collection.immutable.StringOps.toInt(StringOps.scala:31) > > at > org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend$$anonfun$2.apply(SparkDeploySchedulerBackend.scala:42) > > at > org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend$$anonfun$2.apply(SparkDeploySchedulerBackend.scala:42) > > at scala.Option.map(Option.scala:145) > > at > org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.<init>(SparkDeploySchedulerBackend.scala:42) > > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1883) > > at org.apache.spark.SparkContext.<init>(SparkContext.scala:330) > > at > org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:267) > > at > org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145) > > at > org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:389) > > at > org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:73) > > ... 11 more > > > > Zeppelin log file > > INFO [2015-04-14 01:29:17,551] ({pool-1-thread-5} > Paragraph.java[jobRun]:194) - run paragraph 20150414-012109_673822021 using > null org.apache.zeppelin.interpreter.LazyOpenInterpreter@7cfec020 > > INFO [2015-04-14 01:29:17,551] ({pool-1-thread-5} > Paragraph.java[jobRun]:211) - RUN : val rdd = > sc.textFile("/Users/emorozov/tools/apache-cassandra-2.0.14/conf/cassandra.yaml") > > rdd.count > > INFO [2015-04-14 01:29:18,262] ({Thread-32} > NotebookServer.java[broadcast]:251) - SEND >> NOTE > > ERROR [2015-04-14 01:29:19,273] ({pool-1-thread-5} Job.java[run]:183) - > Job failed > > org.apache.zeppelin.interpreter.InterpreterException: > org.apache.thrift.TApplicationException: Internal error processing interpret > > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:222) > > at > org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) > > at > org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:212) > > at org.apache.zeppelin.scheduler.Job.run(Job.java:170) > > at > org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:293) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: org.apache.thrift.TApplicationException: Internal error > processing interpret > > at > org.apache.thrift.TApplicationException.read(TApplicationException.java:108) > > at > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) > > at > org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:190) > > at > org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:175) > > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:205) > > ... 11 more > > > > zeppelin-env.sh is the following: > > export MASTER="spark://emorozov.local:7077" > > export ZEPPELIN_PORT=8089 > > export > ZEPPELIN_JAVA_OPTS="-Dspark.jars=/Users/emorozov/dev/analytics/analytics-jobs/target/analytics-jobs-5.2.0-SNAPSHOT-all.jar" > > export SPARK_HOME="/Users/emorozov/tools/spark-1.2.1-bin-hadoop1/" > > export ZEPPELIN_HOME="/Users/emorozov/dev/zeppelin" > > export ZEPPELIN_MEM="-Xmx4g" > > > > I turned on debug in log4j.properties, hoped to see properties that are > provided into SparkConf (SparkInterpreter.java, line 263: logger.debug), > but there are no properties in the log file. > > > > > > Although in the same notebook, I’m able to run smth like %sh echo blah. > It gives blah as a result. > > > > -- > > Eugene Morozov > > [email protected] > > > > > > > > > > Eugene Morozov > [email protected] > > > > > -- 이종열, Jongyoul Lee, 李宗烈 http://madeng.net
