This is an automated email from the ASF dual-hosted git repository.
yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 931eb938c70d [SPARK-46612][SQL] Do not convert array type string
retrieved from jdbc driver
931eb938c70d is described below
commit 931eb938c70d6c6707e93f164255ac3c7dba63b0
Author: Kent Yao <[email protected]>
AuthorDate: Wed Jan 17 10:58:14 2024 +0800
[SPARK-46612][SQL] Do not convert array type string retrieved from jdbc
driver
Hi, thanks for checking the PR. This is a small bug fix to make Scala Spark
works with Clickhouse's array type. Let me know if this could cause problem on
other DB types.
(Please help to trigger CI if possible. I failed to make the build pipeline
run - any help is appreciated)
### Why are the changes needed?
The PR is to fix issue describe at:
https://github.com/ClickHouse/clickhouse-java/issues/1505
When using spark to write an array of string to Clickhouse, the Clickhouse
JDBC driver throws `java.lang.IllegalArgumentException: Unknown data type:
string` exception.
The exception was due to Spark JDBC utils passing an invalid type value
`string` (should be `String`). The original type values retrieved from
Clickhouse JDBC is correct, but Spark JDBC utils attempts to convert to type
string to lower case:
https://github.com/apache/spark/blob/6b931530d75cb4f00236f9c6283de8ef450963ad/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L639
### What changes were proposed in this pull request?
- Remove the lowercase cast. The string value retrieved from JDBC driver
implementation should be passed as-is, Spark shouldn't try to modify the value.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Follow the reproduce step at:
https://github.com/ClickHouse/clickhouse-java/issues/1505
- Create ClickHouse table with array string type
- Write scala Spark job to write to clickhouse
- Verify the change fixes the issue
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #44459 from phanhuyn/do-not-lower-case-jdbc-array-type.
Lead-authored-by: Kent Yao <[email protected]>
Co-authored-by: Nguyen Phan Huy <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
---
.../org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
index 467c489a50fd..51daea76abc5 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
@@ -20,7 +20,6 @@ package org.apache.spark.sql.execution.datasources.jdbc
import java.sql.{Connection, JDBCType, PreparedStatement, ResultSet,
ResultSetMetaData, SQLException}
import java.time.{Instant, LocalDate}
import java.util
-import java.util.Locale
import java.util.concurrent.TimeUnit
import scala.collection.mutable.ArrayBuffer
@@ -635,8 +634,7 @@ object JdbcUtils extends Logging with SQLConfHelper {
case ArrayType(et, _) =>
// remove type length parameters from end of type name
- val typeName = getJdbcType(et, dialect).databaseTypeDefinition
- .toLowerCase(Locale.ROOT).split("\\(")(0)
+ val typeName = getJdbcType(et,
dialect).databaseTypeDefinition.split("\\(")(0)
(stmt: PreparedStatement, row: Row, pos: Int) =>
val array = conn.createArrayOf(
typeName,
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]