I guess the question is partly about the semantics of
DataFrameReader.schema. If it's supposed to mean "the loaded dataframe will
definitely have exactly this schema", that doesn't quite match the behavior
of the customSchema option. If it's only meant to be an arbitrary schema
input which the source can interpret however it wants, it'd be fine.

The second semantic is IMO more useful, so I'm in favor here.

On Mon, Jul 16, 2018 at 3:43 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> I think there is a sort of inconsistency in how DataFrameReader.jdbc deals
> with a user-defined schema as it makes sure that there's no user-specified
> schema [1][2] yet allows for setting one using customSchema option [3]. Why
> is so? Has this been merely overlooked or similar?
>
> I think assertNoSpecifiedSchema should be removed from
> DataFrameReader.jdbc and support for DataFrameReader.schema for jdbc should
> be added (with the customSchema option marked as deprecated to be removed
> in 2.4 or 3.0).
>
> Should I file an issue in Spark JIRA and do the changes? WDYT?
>
> [1] https://github.com/apache/spark/blob/v2.3.1/sql/core/
> src/main/scala/org/apache/spark/sql/DataFrameReader.
> scala?utf8=%E2%9C%93#L249
> [2] https://github.com/apache/spark/blob/v2.3.1/sql/core/
> src/main/scala/org/apache/spark/sql/DataFrameReader.
> scala?utf8=%E2%9C%93#L320
> [3] https://github.com/apache/spark/blob/v2.3.1/sql/core/
> src/main/scala/org/apache/spark/sql/execution/
> datasources/jdbc/JDBCOptions.scala#L167
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> Mastering Spark SQL https://bit.ly/mastering-spark-sql
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> Follow me at https://twitter.com/jaceklaskowski
>

Reply via email to