Vincent Debergue created SPARK-8843:
---------------------------------------
Summary: DStream transform function receives null instead of RDD
Key: SPARK-8843
URL: https://issues.apache.org/jira/browse/SPARK-8843
Project: Spark
Issue Type: Bug
Components: Streaming
Affects Versions: 1.4.0
Reporter: Vincent Debergue
When using the {{transform}} function on a {{DStream}} with empty values, we
can get a {{NullPointerException}}
You can reproduce the issue with this piece of code in the spark-shell:
{code}
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming._
import scala.reflect.ClassTag
class EmptyDStream[T: scala.reflect.ClassTag](ssc:
org.apache.spark.streaming.StreamingContext) extends InputDStream[T](ssc) {
override def compute(t: org.apache.spark.streaming.Time) = None
override def start() = {}
override def stop() = {}
}
val ssc = new StreamingContext(sc, Seconds(2))
val in = new EmptyDStream[Int](ssc)
val out = in.transform { rdd =>
rdd.map(_ + 1) // rdd is in fact null here
}
out.print()
ssc.start() // NullPointerException
{code}
This bug is very likely to come from the usage of {{orNull}} on the scala
{{Option}}:
https://github.com/apache/spark/blob/branch-1.4/streaming/src/main/scala/org/apache/spark/streaming/dstream/TransformedDStream.scala#L40
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]