Piotr Nestorow created ZEPPELIN-2156:
----------------------------------------
Summary: Paragraph with PySpark streaming - running job cannot be
canceled
Key: ZEPPELIN-2156
URL: https://issues.apache.org/jira/browse/ZEPPELIN-2156
Project: Zeppelin
Issue Type: Bug
Components: pySpark
Affects Versions: 0.7.0
Environment: Linux Ubuntu
Reporter: Piotr Nestorow
In a 'spark.pyspark' paragraph a StreamingContext to a Kafka stream is
created.
The paragraph is started and while the job is running the spark context
produces correct output from the code.
The problem is the job cannot be stopped in the Zeppelin web interface.
Installed Kafka version: kafka_2.11-0.8.2.2
Spark Kafka jar: spark-streaming-kafka-0-8_2.11-2.1.0.jar
Zeppelin: zeppelin-0.7.0-bin-all
Tried:
1. Paragraph Cancel ( || button ) has no effect.
2. Zeppelin Job view Stop All has no effect
3. Another paragraph with
%spark.pyspark
ssc.stop(stopSparkContext=false, stopGracefully=true)
is started by stays in 'Pending'
4. Restarting the 'spark' interpreter stops the job
The example logic:
%spark.pyspark
import sys
import json
from pyspark import SparkContext
from pyspark.streaming import StreamingContext(II())
from pyspark.streaming.kafka import KafkaUtils
zkQuorum, topic, interval = ('localhost:2181', 'airport', 60)
ssc = StreamingContext(sc, interval)
kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer",
{topic: 1})
parsed = kvs.map(lambda (k, v): json.loads(v))
summed = parsed.\
filter(lambda event: 'kind' in event and
event['kind']=='gate').\
map(lambda event: ('count_all',
int(event['value']['passengers']))).\
reduceByKey(lambda x,y: x + y).\
map(lambda x: {'sum': x[0], "passengers": x[1]})
summed.pprint()
ssc.start()
ssc.awaitTermination()
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)