Thiago Siqueira created ZEPPELIN-1573:
-----------------------------------------

             Summary: ClassNotFoundException: 
org.apache.zeppelin.spark.ZeppelinContext when using Zeppelin's input value 
inside spark DataFrame filter method running on Spark Standalone Cluster
                 Key: ZEPPELIN-1573
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1573
             Project: Zeppelin
          Issue Type: Bug
          Components: Interpreters
    Affects Versions: 0.6.2
         Environment: Red Hat Enterprise Linux Server release 7.2 (Maipo), 
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
            Reporter: Thiago Siqueira
            Priority: Minor


ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext when using 
Zeppelin's input value inside spark DataFrame filter method running on Spark 
Standalone Cluster

val city = z.select("City",cities).toString
oDF.select("city").filter(r => city.equals(r.getAs[String]("city"))).count()
I even tried copying the input value to another val with

new String(bytes[])
but still get the same error.

The same code work seamlessly if instead of getting the value from z.select I 
declare as a String literal

city: String = "NY"
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 49.0 failed 4 times, most recent failure: Lost task 0.3 in stage
49.0 (TID 277, 10.6.60.217): java.lang.NoClassDefFoundError:
Lorg/apache/zeppelin/spark/ZeppelinContext;

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 88.0 failed 4 times, most recent failure: Lost task 0.3 in stage 88.0 
(TID 5675, 10.6.60.219): ExecutorLostFailure (executor 27 exited caused by one 
of the running tasks) Reason: Remote RPC client disassociated. Likely due to 
containers exceeding thresholds, or network issues. Check driver logs for WARN 
messages.
Driver stacktrace:
  at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
  at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
  at scala.Option.foreach(Option.scala:257)
  at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1871)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1884)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1897)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1911)
  at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:893)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
  at org.apache.spark.rdd.RDD.collect(RDD.scala:892)
  at 
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:290)
  at 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2183)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
  at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2532)
  at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2182)
  at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2189)
  at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2217)
  at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2216)
  at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2545)
  at org.apache.spark.sql.Dataset.count(Dataset.scala:2216)
  ... 47 elided



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to