Please see the reason in this thread ( https://github.com/apache/spark/pull/14340). It would better to use structured streaming instead.
So I would like to -1 this patch. I think it's been a mistake to support > dstream in Python -- yes it satisfies a checkbox and Spark could claim > there's support for streaming in Python. However, the tooling and maturity > for working with streaming data (both in Spark and the more broad > ecosystem) is simply not there. It is a big baggage to maintain, and > creates a the wrong impression that production streaming jobs can be > written in Python. > On Tue, Jul 4, 2017 at 10:53 PM, Daniel van der Ende < daniel.vandere...@gmail.com> wrote: > Hi, > > I'm working on integrating some pyspark code with Kafka. We'd like to use > SSL/TLS, and so want to use Kafka 0.10. Because structured streaming is > still marked alpha, we'd like to use Spark streaming. On this page, > however, it indicates that the Kafka 0.10 integration in Spark does not > support Python (https://spark.apache.org/docs/latest/streaming-kafka- > integration.html). I've been trying to figure out why, but have not been > able to find anything. Is there any particular reason for this? > > Thanks, > > Daniel >