[
https://issues.apache.org/jira/browse/SPARK-50451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902145#comment-17902145
]
Jungtaek Lim edited comment on SPARK-50451 at 12/2/24 12:53 AM:
----------------------------------------------------------------
structured streaming is designed to restrict the source and sink, which is not
satisfied if the query is going under RDD API. It's probably a bad error
message for users, but the restriction by itself is by intention.
You can do most of things you could do with RDD without RDD in DataFrame API -
mapPartitions without calling rdd should give iterator to you and you can do
with iterator most of the same you do with RDD.
was (Author: kabhwan):
structured streaming is designed to restrict the source and sink, which is not
satisfied if the query is going under RDD API. It's probably a bad error
message for users, but it's by intention.
You can do most of things you could do with RDD without RDD in DataFrame API -
mapPartitions without calling rdd should give iterator to you and you can do
with iterator most of the same you do with RDD.
> mapPartitions fails when called in streaming mode from Python
> -------------------------------------------------------------
>
> Key: SPARK-50451
> URL: https://issues.apache.org/jira/browse/SPARK-50451
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 3.5.0
> Environment: OS: AL2023
> JVM: openjdk 22.0.2 2024-07-16
> OpenJDK Runtime Environment Corretto-22.0.2.9.1 (build 22.0.2+9-FR)
> OpenJDK 64-Bit Server VM Corretto-22.0.2.9.1 (build 22.0.2+9-FR, mixed mode,
> sharing)
> Reporter: Alberto Andreotti
> Priority: Major
>
> Calling the mapPartitions API in Python like this,
>
> mapped_rdd = dataset.rdd.mapPartitions(process_partitions)
>
> in Streaming mode results in the following error,
>
> AnalysisException: Queries with streaming sources must be executed with
> writeStream.start();
> the same API works if I call it from Scala, or if I execute outside Streaming
> Mode.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]