Behar Veliqi created ZEPPELIN-3003:
--------------------------------------

             Summary: NullPointerExcetion on spark.read.json("hdfs://....") in 
Spark Standalone Cluster Mpde
                 Key: ZEPPELIN-3003
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3003
             Project: Zeppelin
          Issue Type: Bug
          Components: Interpreters
    Affects Versions: 0.7.3, 0.7.1
         Environment: - Spark 2.1.1 with Standalone Cluster Manager
- Zeppelin: tested with 0.7.1 as well as 0.7.2
            Reporter: Behar Veliqi


When running Zeppeling against a Spark Cluster with the Standalone Cluster 
Manager and running:


{code:java}
val df = 
spark.read.option("inferSchema","false").json("hdfs://ip:port/path/file.txt")
{code}

I'll get the following exception:


{code:java}
 WARN [2017-10-19 07:51:26,959] ({pool-2-thread-8} 
NotebookServer.java[afterStatusChange]:2064) - Job 20171016-144104_559309535 is 
finished, status: ERROR, exception: null, result: %text 
java.lang.NullPointerException
        at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
        at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
        at 
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:398)
        at 
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:387)
        at 
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
        at 
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:843)
        at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
        at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491)
        at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
        at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{code}



However, the following (i.e. .text() instead of .json()) works perfectly fine:

{code:java}
val df = 
spark.read.option("inferSchema","false").json("hdfs://ip:port/path/file.txt")
{code}


When I change the Spark-Master from setting from 

{code:java}
spark://host1:7077,host2:7077,host3:7077
{code}

to 

{code:java}
local[*]
{code}

then both (.json() as well as .text()) work fine.
So the json-file themselves are valid JSON since they're being parsed properly 
with a local Spark instance, but as soon as moving to the cluster mode, only 
text continues working and json throws a NullPointerException. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to