[jira] [Created] (ZEPPELIN-3749) New Spark interpreter has to be restarted two times in order to work fine for different users

Jhon Cardenas (JIRA) Wed, 29 Aug 2018 13:05:11 -0700

Jhon Cardenas created ZEPPELIN-3749:
---------------------------------------


             Summary: New Spark interpreter has to be restarted two times in 
order to work fine for different users
                 Key: ZEPPELIN-3749
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3749
             Project: Zeppelin
          Issue Type: Bug
          Components: Interpreters
    Affects Versions: 0.8.0, 0.8.1
         Environment: *Spark interpreter property to reproduce:*

zeppelin.spark.useNew -> true

*Spark interpreter instantiation mode*:

per user - scoped

*Zeppelin version:*

branch-0.8 (Until july 23)
            Reporter: Jhon Cardenas
         Attachments: first_error.txt, second_error.txt

New Spark interpreter has to be restarted two times in order to work fine for 
different users.

To reproduce this you have to configure zeppelin to use the new interpreter:
zeppelin.spark.useNew -> true

And the instantiation mode: per user - scoped

*Steps to reproduce:*
1. User A login to zeppelin and runs some spark paragraph. It should works fine.
2. User B login to zeppelin and runs some spark paragraph, for example
{code:java}
%spark
println(sc.version)
println(scala.util.Properties.versionString)
{code}
3. This error appears (see entire log trace [^first_error.txt] ):
{\{java.lang.IllegalStateException: Cannot call methods on a stopped 
SparkContext. This stopped SparkContext was created at: .....}}
4. The user B restart the spark interpreter from notebook page, and executes 
now a paragraph that throws a job, for example:

{code:java}
import sqlContext.implicits._
import org.apache.commons.io.IOUtils
import java.net.URL
import java.nio.charset.Charset

// Zeppelin creates and injects sc (SparkContext) and sqlContext (HiveContext 
or SqlContext)
// So you don't need create them manually

// load bank data
val bankText = sc.parallelize(
    IOUtils.toString(
        new 
URL("https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv";),
        Charset.forName("utf8")).split("\n"))

sc.parallelize(1 to 1000000).foreach(n => print((java.lang.Math.random() * 
1000000) + n))

case class Bank(age: Integer, job: String, marital: String, education: String, 
balance: Integer)

val bank = bankText.map(s => s.split(";")).filter(s => s(0) != "\"age\"").map(
    s => Bank(s(0).toInt, 
            s(1).replaceAll("\"", ""),
            s(2).replaceAll("\"", ""),
            s(3).replaceAll("\"", ""),
            s(5).replaceAll("\"", "").toInt
        )
).toDF()
bank.registerTempTable("bank")
{code}
5. This error appears (see entire log trace  [^second_error.txt] ):
{\{org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in 
stage 0.0 failed 4 times, most recent failure: Lost task 6.3 in stage 0.0 (TID 
36, 100.96.85.172, executor 2): java.lang.ClassNotFoundException: $anonfun$1 at 
org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:82)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at 
java.lang.ClassLoader.loadClass(ClassLoader.java:357) at 
java.lang.Class.forName0(Native Method) at 
java.lang.Class.forName(Class.java:348) at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
 .....}}
6. User B restart spark interpreter from notebook page and now it works.

*Actual Behavior:*
The user B has to restart two times spark interpreter so it can works.

*Expected Behavior:*
Spark should works fine for another users without any restarting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ZEPPELIN-3749) New Spark interpreter has to be restarted two times in order to work fine for different users

Reply via email to