fxoSa created TOREE-457:
---------------------------
Summary: spark context seen corrupted after load KAfka libraries
Key: TOREE-457
URL: https://issues.apache.org/jira/browse/TOREE-457
Project: TOREE
Issue Type: Bug
Components: Kernel
Reporter: fxoSa
Priority: Minor
I am trying to set up a jupyter notebook (apache-toree Scala) to access kafka
logs from spark a streaming.
First I add dependencies using AddDeps:
{code:java}
%AddDeps org.apache.spark spark-streaming-kafka-0-10_2.11 2.2.0.
Marking org.apache.spark:spark-streaming-kafka-0-10_2.11:2.2.0 for download
Preparing to fetch from:
-> file:/tmp/toree_add_deps8235567186565695423/
-> https://repo1.maven.org/maven2
-> New file at
/tmp/toree_add_deps8235567186565695423/https/repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-10_2.11/2.2.0/spark-streaming-kafka-0-10_2.11-2.2.0.jar
{code}
After that I am able to import successfully part of necesary libraries:
{code:java}
import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka010._
{code}
However code fails when I try to create streaming context:
{code:java}
val ssc = new StreamingContext(sc, Seconds(2))
Name: Compile Error
Message: <console>:38: error: overloaded method constructor StreamingContext
with alternatives:
(path: String,sparkContext:
org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext)org.apache.spark.streaming.StreamingContext
<and>
(path: String,hadoopConf:
org.apache.hadoop.conf.Configuration)org.apache.spark.streaming.StreamingContext
<and>
(conf: org.apache.spark.SparkConf,batchDuration:
org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext
<and>
(sparkContext:
org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext,batchDuration:
org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext
cannot be applied to
(org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext,
org.apache.spark.streaming.Duration)
val ssc = new StreamingContext(sc, Seconds(2))
^
StackTrace:
{code}
I have try it, in a jupyter docker
https://github.com/jupyter/docker-stacks/tree/master/all-spark-notebook
and in spark cluster set up in Google cloud platform with the same results
Thanks
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)