Re: Spark interpreter throws: NumberFormatException for "". Researched.

Jongyoul Lee Tue, 14 Apr 2015 18:53:45 -0700

Hi Eugene,

Nice catch! This looks like a side effect of ZEPPELIN-28. spark.cores.max
doesn't have a default value itself but it follows
spark.deploy.defaultCores whose initial value is Int.MaxValue. I think It
should be fixed. Can you make a JIRA ticket and PR for github of this issue?


Regards,
Jongyoul Lee

On Wed, Apr 15, 2015 at 8:39 AM, Eugene Morozov <[email protected]> wrote:

> Hi!
>
> Yesterday, I had an issue with running Zeppelin on Spark 1.2.1, but I
> wrote to user list of this project. The issue is described in details in my
> previous email. Today, I’ve debugged it a little and I seems to find a
> weird place, where I believe the issue might be.
>
> That is a line 38 of SparkDeploySchedulerBackend: val maxCores =
> conf.getOption("spark.cores.max").map(_.toInt)
>
> I wrote a simple program to prove that NumberFormatException comes from
> exactly this line:
>         val conf: SparkConf = new SparkConf()
>         conf.set("spark.cores.max","")
>         conf.getOption("spark.cores.max").map(_.toInt)
>
> These three lines gives NumberFormatException.
> So, this spark.cores.max comes from two places:
> 1. static initialiser of SparkInterpreter adds spark.cores.max with empty
> string value. It was null, but has been changed.
> 2. from file conf/interpreter.json, where spark.cores.max is also a empty
> string value.
>
> Description of this field states that if it’s blank, then it should become
> number of cores in the system, but that doesn’t happen for me for some
> reason. When I’ve changed it in conf/interpreter.json Zeppelin started
> worked as it should.
>
> I believe it should be number of cores in static initialiser of
> SparkInterpreter instead of empty string value, but not sure.
> - Is there a better fix.
> - What’s the best approach to run it under debug? Right now I’m using
> remote debug, but interpreter starts in particular time and it’s hard to
> start debugging in the right, there should be a way simpler than that. May
> be there are some resources for developers how to configure local
> environment?
>
> On 14 Apr 2015, at 02:14, Eugene Morozov <[email protected]> wrote:
>
> > Hi!
> >
> > I’m trying to run at least something using my spark / cassandra setup:
> prebuilt spark 1.2.1-hadoop1, cassandra 2.0.14
> > Spark by itself is working fine, I have several tests in my project -
> they’re working, I’m able to use spark shell to run my project - everything
> is fine.
> >
> > So, the simplest thing I’m trying now is run zeppelin, then creating a
> paragraph:
> > val rdd =
> sc.textFile("/Users/emorozov/tools/apache-cassandra-2.0.14/conf/cassandra.yaml")
> > rdd.count
> >
> > Here is what I see in the log files:
> >
> > Zeppelin Interpreter log file:
> >  INFO [2015-04-14 01:29:21,097] ({pool-1-thread-5}
> Logging.scala[logInfo]:59) - Successfully started service 'SparkUI' on port
> 4045.
> >  INFO [2015-04-14 01:29:21,097] ({pool-1-thread-5}
> Logging.scala[logInfo]:59) - Started SparkUI at http://10.59.26.123:4045
> >  INFO [2015-04-14 01:29:21,255] ({pool-1-thread-5}
> Logging.scala[logInfo]:59) - Added JAR
> file:/Users/emorozov/dev/analytics/analytics-jobs/target/analytics-jobs-5.2.0-SNAPSHOT-all.jar
> at http://10.59.26.123:53658/jars/analytics-jobs-5.2.0-SNAPSHOT-all.jar
> with timestamp 1429000161255
> > ERROR [2015-04-14 01:29:21,256] ({pool-1-thread-5}
> ProcessFunction.java[process]:41) - Internal error processing getProgress
> > org.apache.zeppelin.interpreter.InterpreterException:
> java.lang.NumberFormatException: For input string: ""
> >       at
> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:75)
> >       at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
> >       at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109)
> >       at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:299)
> >       at
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:938)
> >       at
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:923)
> >       at
> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> >       at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> >       at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> >       at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >       at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.lang.NumberFormatException: For input string: ""
> >       at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> >       at java.lang.Integer.parseInt(Integer.java:504)
> >       at java.lang.Integer.parseInt(Integer.java:527)
> >       at
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
> >       at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
> >       at
> org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend$$anonfun$2.apply(SparkDeploySchedulerBackend.scala:42)
> >       at
> org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend$$anonfun$2.apply(SparkDeploySchedulerBackend.scala:42)
> >       at scala.Option.map(Option.scala:145)
> >       at
> org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.<init>(SparkDeploySchedulerBackend.scala:42)
> >       at
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1883)
> >       at org.apache.spark.SparkContext.<init>(SparkContext.scala:330)
> >       at
> org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:267)
> >       at
> org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
> >       at
> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:389)
> >       at
> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:73)
> >       ... 11 more
> >
> > Zeppelin log file
> > INFO [2015-04-14 01:29:17,551] ({pool-1-thread-5}
> Paragraph.java[jobRun]:194) - run paragraph 20150414-012109_673822021 using
> null org.apache.zeppelin.interpreter.LazyOpenInterpreter@7cfec020
> >  INFO [2015-04-14 01:29:17,551] ({pool-1-thread-5}
> Paragraph.java[jobRun]:211) - RUN : val rdd =
> sc.textFile("/Users/emorozov/tools/apache-cassandra-2.0.14/conf/cassandra.yaml")
> > rdd.count
> >  INFO [2015-04-14 01:29:18,262] ({Thread-32}
> NotebookServer.java[broadcast]:251) - SEND >> NOTE
> > ERROR [2015-04-14 01:29:19,273] ({pool-1-thread-5} Job.java[run]:183) -
> Job failed
> > org.apache.zeppelin.interpreter.InterpreterException:
> org.apache.thrift.TApplicationException: Internal error processing interpret
> >       at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:222)
> >       at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
> >       at
> org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:212)
> >       at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
> >       at
> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:293)
> >       at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >       at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> >       at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> >       at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >       at java.lang.Thread.run(Thread.java:745)
> > Caused by: org.apache.thrift.TApplicationException: Internal error
> processing interpret
> >       at
> org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
> >       at
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
> >       at
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:190)
> >       at
> org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:175)
> >       at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:205)
> >       ... 11 more
> >
> > zeppelin-env.sh is the following:
> > export MASTER="spark://emorozov.local:7077"
> > export ZEPPELIN_PORT=8089
> > export
> ZEPPELIN_JAVA_OPTS="-Dspark.jars=/Users/emorozov/dev/analytics/analytics-jobs/target/analytics-jobs-5.2.0-SNAPSHOT-all.jar"
> > export SPARK_HOME="/Users/emorozov/tools/spark-1.2.1-bin-hadoop1/"
> > export ZEPPELIN_HOME="/Users/emorozov/dev/zeppelin"
> > export ZEPPELIN_MEM="-Xmx4g"
> >
> > I turned on debug in log4j.properties, hoped to see properties that are
> provided into SparkConf (SparkInterpreter.java, line 263: logger.debug),
> but there are no properties in the log file.
> >
> >
> > Although in the same notebook, I’m able to run smth like %sh echo blah.
> It gives blah as a result.
> >
> > --
> > Eugene Morozov
> > [email protected]
> >
> >
> >
> >
>
> Eugene Morozov
> [email protected]
>
>
>
>
>


-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: Spark interpreter throws: NumberFormatException for "". Researched.

Reply via email to