Spark interpreter throws: NumberFormatException for "". Researched.

Eugene Morozov Tue, 14 Apr 2015 16:41:00 -0700

Hi!

Yesterday, I had an issue with running Zeppelin on Spark 1.2.1, but I wrote to 
user list of this project. The issue is described in details in my previous 
email. Today, I’ve debugged it a little and I seems to find a weird place, 
where I believe the issue might be.


That is a line 38 of SparkDeploySchedulerBackend: val maxCores = 
conf.getOption("spark.cores.max").map(_.toInt)

I wrote a simple program to prove that NumberFormatException comes from exactly 
this line:
        val conf: SparkConf = new SparkConf()
        conf.set("spark.cores.max","")
        conf.getOption("spark.cores.max").map(_.toInt)

These three lines gives NumberFormatException.
So, this spark.cores.max comes from two places:
1. static initialiser of SparkInterpreter adds spark.cores.max with empty 
string value. It was null, but has been changed.
2. from file conf/interpreter.json, where spark.cores.max is also a empty 
string value.

Description of this field states that if it’s blank, then it should become 
number of cores in the system, but that doesn’t happen for me for some reason. 
When I’ve changed it in conf/interpreter.json Zeppelin started worked as it 
should.

I believe it should be number of cores in static initialiser of 
SparkInterpreter instead of empty string value, but not sure.
- Is there a better fix.
- What’s the best approach to run it under debug? Right now I’m using remote 
debug, but interpreter starts in particular time and it’s hard to start 
debugging in the right, there should be a way simpler than that. May be there 
are some resources for developers how to configure local environment?

On 14 Apr 2015, at 02:14, Eugene Morozov <[email protected]> wrote:

> Hi!
> 
> I’m trying to run at least something using my spark / cassandra setup: 
> prebuilt spark 1.2.1-hadoop1, cassandra 2.0.14
> Spark by itself is working fine, I have several tests in my project - they’re 
> working, I’m able to use spark shell to run my project - everything is fine.
> 
> So, the simplest thing I’m trying now is run zeppelin, then creating a 
> paragraph:
> val rdd = 
> sc.textFile("/Users/emorozov/tools/apache-cassandra-2.0.14/conf/cassandra.yaml")
> rdd.count
> 
> Here is what I see in the log files:
> 
> Zeppelin Interpreter log file:
>  INFO [2015-04-14 01:29:21,097] ({pool-1-thread-5} Logging.scala[logInfo]:59) 
> - Successfully started service 'SparkUI' on port 4045.
>  INFO [2015-04-14 01:29:21,097] ({pool-1-thread-5} Logging.scala[logInfo]:59) 
> - Started SparkUI at http://10.59.26.123:4045
>  INFO [2015-04-14 01:29:21,255] ({pool-1-thread-5} Logging.scala[logInfo]:59) 
> - Added JAR 
> file:/Users/emorozov/dev/analytics/analytics-jobs/target/analytics-jobs-5.2.0-SNAPSHOT-all.jar
>  at http://10.59.26.123:53658/jars/analytics-jobs-5.2.0-SNAPSHOT-all.jar with 
> timestamp 1429000161255
> ERROR [2015-04-14 01:29:21,256] ({pool-1-thread-5} 
> ProcessFunction.java[process]:41) - Internal error processing getProgress
> org.apache.zeppelin.interpreter.InterpreterException: 
> java.lang.NumberFormatException: For input string: ""
>       at 
> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:75)
>       at 
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
>       at 
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109)
>       at 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:299)
>       at 
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:938)
>       at 
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:923)
>       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>       at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>       at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NumberFormatException: For input string: ""
>       at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>       at java.lang.Integer.parseInt(Integer.java:504)
>       at java.lang.Integer.parseInt(Integer.java:527)
>       at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
>       at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
>       at 
> org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend$$anonfun$2.apply(SparkDeploySchedulerBackend.scala:42)
>       at 
> org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend$$anonfun$2.apply(SparkDeploySchedulerBackend.scala:42)
>       at scala.Option.map(Option.scala:145)
>       at 
> org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.<init>(SparkDeploySchedulerBackend.scala:42)
>       at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1883)
>       at org.apache.spark.SparkContext.<init>(SparkContext.scala:330)
>       at 
> org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:267)
>       at 
> org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
>       at 
> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:389)
>       at 
> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:73)
>       ... 11 more
> 
> Zeppelin log file
> INFO [2015-04-14 01:29:17,551] ({pool-1-thread-5} Paragraph.java[jobRun]:194) 
> - run paragraph 20150414-012109_673822021 using null 
> org.apache.zeppelin.interpreter.LazyOpenInterpreter@7cfec020
>  INFO [2015-04-14 01:29:17,551] ({pool-1-thread-5} 
> Paragraph.java[jobRun]:211) - RUN : val rdd = 
> sc.textFile("/Users/emorozov/tools/apache-cassandra-2.0.14/conf/cassandra.yaml")
> rdd.count
>  INFO [2015-04-14 01:29:18,262] ({Thread-32} 
> NotebookServer.java[broadcast]:251) - SEND >> NOTE
> ERROR [2015-04-14 01:29:19,273] ({pool-1-thread-5} Job.java[run]:183) - Job 
> failed
> org.apache.zeppelin.interpreter.InterpreterException: 
> org.apache.thrift.TApplicationException: Internal error processing interpret
>       at 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:222)
>       at 
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>       at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:212)
>       at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>       at 
> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:293)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.TApplicationException: Internal error processing 
> interpret
>       at 
> org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
>       at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
>       at 
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:190)
>       at 
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:175)
>       at 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:205)
>       ... 11 more
>  
> zeppelin-env.sh is the following:
> export MASTER="spark://emorozov.local:7077"
> export ZEPPELIN_PORT=8089
> export 
> ZEPPELIN_JAVA_OPTS="-Dspark.jars=/Users/emorozov/dev/analytics/analytics-jobs/target/analytics-jobs-5.2.0-SNAPSHOT-all.jar"
> export SPARK_HOME="/Users/emorozov/tools/spark-1.2.1-bin-hadoop1/"
> export ZEPPELIN_HOME="/Users/emorozov/dev/zeppelin"
> export ZEPPELIN_MEM="-Xmx4g"
> 
> I turned on debug in log4j.properties, hoped to see properties that are 
> provided into SparkConf (SparkInterpreter.java, line 263: logger.debug), but 
> there are no properties in the log file.
> 
> 
> Although in the same notebook, I’m able to run smth like %sh echo blah. It 
> gives blah as a result.
> 
> --
> Eugene Morozov
> [email protected]
> 
> 
> 
> 

Eugene Morozov
[email protected]

Spark interpreter throws: NumberFormatException for "". Researched.

Reply via email to