[
https://issues.apache.org/jira/browse/SPARK-36996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428141#comment-17428141
]
Senthil Kumar commented on SPARK-36996:
---------------------------------------
Sample Output after this changes:
SQL :
mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName
varchar(255), Age int);
mysql> desc Persons;
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| Id | int | NO | | NULL | |
| FirstName | varchar(255) | YES | | NULL | |
| LastName | varchar(255) | YES | | NULL | |
| Age | int | YES | | NULL | |
+-----------+--------------+------+-----+---------+-------+
----------++-----------++----------------+
Spark:
scala> val df =
spark.read.format("jdbc").option("database","Test_DB").option("user",
"root").option("password", "").option("driver",
"com.mysql.cj.jdbc.Driver").option("url",
"jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more
fields]
scala> df.printSchema()
root
|-- Id: integer (nullable = false)
|-- FirstName: string (nullable = true)
|-- LastName: string (nullable = true)
|-- Age: integer (nullable = true)
And for TIMESTAMP columns
SQL:
create table timestamp_test(id int(11), time_stamp timestamp not null default
current_timestamp);
SPARK:
scala> val df =
spark.read.format("jdbc").option("database","Test_DB").option("user",
"root").option("password", "").option("driver",
"com.mysql.cj.jdbc.Driver").option("url",
"jdbc:mysql://localhost:3306/Test_DB").option("dbtable",
"timestamp_test").load()
df: org.apache.spark.sql.DataFrame = [id: int, time_stamp: timestamp]
scala> df.printSchema()
root
|-- id: integer (nullable = true)
|-- time_stamp: timestamp (nullable = true)
> fixing "SQL column nullable setting not retained as part of spark read" issue
> -----------------------------------------------------------------------------
>
> Key: SPARK-36996
> URL: https://issues.apache.org/jira/browse/SPARK-36996
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.0, 3.1.0, 3.1.1, 3.1.2
> Reporter: Senthil Kumar
> Priority: Major
>
> Sql 'nullable' columns are not retaining 'nullable' type as it is while
> reading from Spark read using jdbc format.
>
> SQL :
> ------------
>
> mysql> CREATE TABLE Persons(Id int NOT NULL, FirstName varchar(255), LastName
> varchar(255), Age int);
>
> mysql> desc Persons;
> +-----------+--------------+------+-----+---------+-------+
> | Field | Type | Null | Key | Default | Extra |
> +-----------+--------------+------+-----+---------+-------+
> | Id | int | NO | | NULL | |
> | FirstName | varchar(255) | YES | | NULL | |
> | LastName | varchar(255) | YES | | NULL | |
> | Age | int | YES | | NULL | |
> +-----------+--------------+------+-----+---------+-------+
>
> But in Spark we get all the columns as "Nullable":
> =============
> scala> val df =
> spark.read.format("jdbc").option("database","Test_DB").option("user",
> "root").option("password", "").option("driver",
> "com.mysql.cj.jdbc.Driver").option("url",
> "jdbc:mysql://localhost:3306/Test_DB").option("dbtable", "Persons").load()
> df: org.apache.spark.sql.DataFrame = [Id: int, FirstName: string ... 2 more
> fields]
> scala> df.printSchema()
> root
> |-- Id: integer (nullable = true)
> |-- FirstName: string (nullable = true)
> |-- LastName: string (nullable = true)
> |-- Age: integer (nullable = true)
> =============
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]