[ 
https://issues.apache.org/jira/browse/SPARK-50451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902145#comment-17902145
 ] 

Jungtaek Lim edited comment on SPARK-50451 at 12/2/24 12:53 AM:
----------------------------------------------------------------

structured streaming is designed to restrict the source and sink, which is not 
satisfied if the query is going under RDD API. It's probably a bad error 
message for users, but the restriction by itself is by intention.

You can do most of things you could do with RDD without RDD in DataFrame API - 
mapPartitions without calling rdd should give iterator to you and you can do 
with iterator most of the same you do with RDD.


was (Author: kabhwan):
structured streaming is designed to restrict the source and sink, which is not 
satisfied if the query is going under RDD API. It's probably a bad error 
message for users, but it's by intention.

You can do most of things you could do with RDD without RDD in DataFrame API - 
mapPartitions without calling rdd should give iterator to you and you can do 
with iterator most of the same you do with RDD.

> mapPartitions fails when called in streaming mode from Python
> -------------------------------------------------------------
>
>                 Key: SPARK-50451
>                 URL: https://issues.apache.org/jira/browse/SPARK-50451
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 3.5.0
>         Environment: OS: AL2023
> JVM: openjdk 22.0.2 2024-07-16
> OpenJDK Runtime Environment Corretto-22.0.2.9.1 (build 22.0.2+9-FR)
> OpenJDK 64-Bit Server VM Corretto-22.0.2.9.1 (build 22.0.2+9-FR, mixed mode, 
> sharing)
>            Reporter: Alberto Andreotti
>            Priority: Major
>
> Calling the mapPartitions API in Python like this,
>  
> mapped_rdd = dataset.rdd.mapPartitions(process_partitions)
>  
> in Streaming mode results in the following error,
>  
> AnalysisException: Queries with streaming sources must be executed with 
> writeStream.start();
> the same API works if I call it from Scala, or if I execute outside Streaming 
> Mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to