I remembered the actual case from developer who implements custom data source.
https://lists.apache.org/thread.html/c1a210510b48bb1fea89828c8e2f5db8c27eba635e0079a97b0c7faf@%3Cdev.spark.apache.org%3E Quoting here: We started implementing DSv2 in the 2.4 branch, but quickly discovered that the DSv2 in 3.0 was a complete breaking change (to the point where it could have been named DSv3 and it wouldn’t have come as a surprise). Since the DSv2 in 3.0 has a compatibility layer for DSv1 datasources, we decided to fall back into DSv1 in order to ease the future transition to Spark 3. Given DSv2 for Spark 2.x and 3.x are diverged a lot, realistic solution on dealing with DSv2 breaking change is having DSv1 as temporary solution, even DSv2 for 3.x will be available. They need some time to make transition. I would file an issue to support streaming data source on DSv1 and submit a patch unless someone objects. On Wed, Oct 2, 2019 at 4:08 PM Jacek Laskowski <ja...@japila.pl> wrote: > Hi Jungtaek, > > Thanks a lot for your very prompt response! > > > Looks like it's missing, or intended to force custom streaming source > implemented as DSv2. > > That's exactly my understanding = no more DSv1 data sources. That however > is not consistent with the official message, is it? Spark 2.4.4 does not > actually say "we're abandoning DSv1", and people could not really want to > jump on DSv2 since it's not recommended (unless I missed that). > > I love surprises (as that's where people pay more for consulting :)), but > not necessarily before public talks (with one at SparkAISummit in two > weeks!) Gonna be challenging! Hope I won't spread a wrong word. > > Pozdrawiam, > Jacek Laskowski > ---- > https://about.me/JacekLaskowski > The Internals of Spark SQL https://bit.ly/spark-sql-internals > The Internals of Spark Structured Streaming > https://bit.ly/spark-structured-streaming > The Internals of Apache Kafka https://bit.ly/apache-kafka-internals > Follow me at https://twitter.com/jaceklaskowski > > > > On Wed, Oct 2, 2019 at 6:16 AM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> Looks like it's missing, or intended to force custom streaming source >> implemented as DSv2. >> >> I'm not sure Spark community wants to expand DSv1 API: I could propose >> the change if we get some supports here. >> >> To Spark community: given we bring major changes on DSv2, someone would >> want to rely on DSv1 while transition from old DSv2 to new DSv2 happens and >> new DSv2 gets stabilized. Would we like to provide necessary changes on >> DSv1? >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> On Wed, Oct 2, 2019 at 4:27 AM Jacek Laskowski <ja...@japila.pl> wrote: >> >>> Hi, >>> >>> I think I've got stuck and without your help I won't move any further. >>> Please help. >>> >>> I'm with Spark 2.4.4 and am developing a streaming Source (DSv1, >>> MicroBatch) and in getBatch phase when requested for a DataFrame, there is >>> this assert [1] I can't seem to go past with any DataFrame I managed to >>> create as it's not streaming. >>> >>> assert(batch.isStreaming, >>> s"DataFrame returned by getBatch from $source did not have >>> isStreaming=true\n" + >>> s"${batch.queryExecution.logical}") >>> >>> [1] >>> https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L439-L441 >>> >>> All I could find is private[sql], >>> e.g. SQLContext.internalCreateDataFrame(..., isStreaming = true) [2] or [3] >>> >>> [2] >>> https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L422-L428 >>> [3] >>> https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L62-L81 >>> >>> Pozdrawiam, >>> Jacek Laskowski >>> ---- >>> https://about.me/JacekLaskowski >>> The Internals of Spark SQL https://bit.ly/spark-sql-internals >>> The Internals of Spark Structured Streaming >>> https://bit.ly/spark-structured-streaming >>> The Internals of Apache Kafka https://bit.ly/apache-kafka-internals >>> Follow me at https://twitter.com/jaceklaskowski >>> >>>