Ruslan Dautkhanov created ZEPPELIN-3327: -------------------------------------------
Summary: NPE when Spark interpreter couldn't start Key: ZEPPELIN-3327 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3327 Project: Zeppelin Issue Type: Bug Affects Versions: 0.8.0, 0.9.0 Reporter: Ruslan Dautkhanov Attachments: image-2018-03-13-19-16-46-353.png, image-2018-03-13-19-19-59-364.png When Spark couldn't start on backend, Zeppelin just shows NPE: !image-2018-03-13-19-16-46-353.png! What it should have printed, is true root cause or exception as it was given by spark-submit. To reproduce, for example, add an invalid spark interpreter setting, like !image-2018-03-13-19-19-59-364.png! and try to start Spark interpreter to reproduce NPE. This is confusing to users not to see true error obstructed by NPE. Zeppelin should transparently deliver exception, as was produced by Spark, like in this example: {noformat} Caused by: java.lang.NumberFormatException: Size must be specified as bytes (b), kibibytes (k), mebibytes (m), gibibytes (g), tebibytes (t), or pebibytes(p). E.g. 50b, 100k, or 250m. Invalid suffix: "petabytes" at org.apache.spark.network.util.JavaUtils.byteStringAs(JavaUtils.java:291) at org.apache.spark.network.util.JavaUtils.byteStringAsBytes(JavaUtils.java:302) at org.apache.spark.util.Utils$.byteStringAsBytes(Utils.scala:1087) at org.apache.spark.SparkConf.getSizeAsBytes(SparkConf.scala:302) at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:223) at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:199) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:332) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257) at org.apache.spark.SparkContext.<init>(SparkContext.scala:432) {noformat} Notice I had to dig deep into the logs to find root cause and not every user can do that. Full exception from interpreter log - {noformat} ERROR [2018-03-13 19:15:26,476] ({pool-2-thread-2} PySparkInterpreter.java[open]:203) - Error java.lang.NullPointerException at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:44) at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:39) at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext_2(OldSparkInterpreter.java:375) at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext(OldSparkInterpreter.java:364) at org.apache.zeppelin.spark.OldSparkInterpreter.getSparkContext(OldSparkInterpreter.java:172) at org.apache.zeppelin.spark.OldSparkInterpreter.open(OldSparkInterpreter.java:740) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:61) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) at org.apache.zeppelin.spark.PySparkInterpreter.getSparkInterpreter(PySparkInterpreter.java:665) at org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:273) at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:201) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:618) at org.apache.zeppelin.scheduler.Job.run(Job.java:186) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ERROR [2018-03-13 19:15:26,476] ({pool-2-thread-2} Job.java[run]:188) - Job failed org.apache.zeppelin.interpreter.InterpreterException: java.lang.NullPointerException at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:204) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:618) at org.apache.zeppelin.scheduler.Job.run(Job.java:186) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:44) at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:39) at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext_2(OldSparkInterpreter.java:375) at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext(OldSparkInterpreter.java:364) at org.apache.zeppelin.spark.OldSparkInterpreter.getSparkContext(OldSparkInterpreter.java:172) at org.apache.zeppelin.spark.OldSparkInterpreter.open(OldSparkInterpreter.java:740) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:61) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) at org.apache.zeppelin.spark.PySparkInterpreter.getSparkInterpreter(PySparkInterpreter.java:665) at org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:273) at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:201) ... 11 more INFO [2018-03-13 19:15:26,487] ({pool-2-thread-2} SchedulerFactory.java[jobFinished]:115) - Job 20180313-115214_1579158632 finished by scheduler interpreter_860134591 {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)