[ 
https://issues.apache.org/jira/browse/SPARK-53890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anish Mahto updated SPARK-53890:
--------------------------------
    Description: 
Add tests to verify read/readstream options are actually respected by the flow 
that executes the read/readstream dataframe.

Trivial test example might be:
{code:python}
@materialized_view def mv_from_csv():
   return spark.read.option("delimiter", "|").csv("/my/table.csv")
{code}
I suspect that today, the read/readstream options will not be respected 
([1|https://github.com/apache/spark/blob/master/sql/pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/FlowAnalysis.scala#L120],
 [2)|#L131]).]

If true, a solution might be to copy over the options in the 
`UnresolvedRelation` into either the DataFrameReader that is constructed or the 
`streamingReadOptions`/`batchReadOptions` argument.

 

  was:
Add tests to verify read/readstream options are actually respected by the flow 
that executes the read/readstream dataframe.


Trivial test example might be:
{code:python}
@materialized_view def mv_from_csv():
   return spark.read.option("delimiter", "|").csv("/my/table.csv")
{code}

I suspect that today, the read/readstream options will not be respected 
([1|https://github.com/apache/spark/blob/master/sql/pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/FlowAnalysis.scala#L120],
 
[2|[https://github.com/apache/spark/blob/master/sql/pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/FlowAnalysis.scala#L131]).]

If true, a solution might be to copy over the options in the 
`UnresolvedRelation` into either the DataFrameReader that is constructed or the 
`streamingReadOptions`/`batchReadOptions` argument.

 


> [SDP] Test (and fix) read/readstream options are respected for pipelines
> ------------------------------------------------------------------------
>
>                 Key: SPARK-53890
>                 URL: https://issues.apache.org/jira/browse/SPARK-53890
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Declarative Pipelines
>    Affects Versions: 4.1.0
>            Reporter: Anish Mahto
>            Priority: Major
>
> Add tests to verify read/readstream options are actually respected by the 
> flow that executes the read/readstream dataframe.
> Trivial test example might be:
> {code:python}
> @materialized_view def mv_from_csv():
>    return spark.read.option("delimiter", "|").csv("/my/table.csv")
> {code}
> I suspect that today, the read/readstream options will not be respected 
> ([1|https://github.com/apache/spark/blob/master/sql/pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/FlowAnalysis.scala#L120],
>  [2)|#L131]).]
> If true, a solution might be to copy over the options in the 
> `UnresolvedRelation` into either the DataFrameReader that is constructed or 
> the `streamingReadOptions`/`batchReadOptions` argument.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to