Anish Mahto created SPARK-53890:
-----------------------------------
Summary: [SDP] Test (and fix) read/readstream options are
respected for pipelines
Key: SPARK-53890
URL: https://issues.apache.org/jira/browse/SPARK-53890
Project: Spark
Issue Type: Sub-task
Components: Declarative Pipelines
Affects Versions: 4.1.0
Reporter: Anish Mahto
Add tests to verify read/readstream options are actually respected by the flow
that executes the read/readstream dataframe.
Trivial test example might be:
```
@materialized_view def mv_from_csv(): return spark.read.option("delimiter",
"|").csv("/my/table.csv")
```
I suspect that today, the read/readstream options will not be respected
([1|https://github.com/apache/spark/blob/master/sql/pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/FlowAnalysis.scala#L120],
[2|[https://github.com/apache/spark/blob/master/sql/pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/FlowAnalysis.scala#L131]).]
If true, a solution might be to copy over the options in the
`UnresolvedRelation` into either the DataFrameReader that is constructed or the
`streamingReadOptions`/`batchReadOptions` argument.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]