Nguyen Phan Huy created SPARK-46612:
---------------------------------------

             Summary: Clickhouse's JDBC throws 
`java.lang.IllegalArgumentException: Unknown data type: string` when write 
array string with Apache Spark scala
                 Key: SPARK-46612
                 URL: https://issues.apache.org/jira/browse/SPARK-46612
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.0
            Reporter: Nguyen Phan Huy


h3. Bug description

 

When using Scala spark to write an array of string to Clickhouse, the driver 
throws {{java.lang.IllegalArgumentException: Unknown data type: string}} 
exception.

Exception is thrown by: 
[https://github.com/ClickHouse/clickhouse-java/blob/aa3870eadb1a2d3675fd5119714c85851800f076/clickhouse-data/src/main/java/com/clickhouse/data/ClickHouseDataType.java#L238]

This was caused by Spark JDBC Utils tried to cast the type to lower case 
({{{}String{}}} -> {{{}string{}}}).
[https://github.com/apache/spark/blob/6b931530d75cb4f00236f9c6283de8ef450963ad/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L639]
h3. Steps to reproduce

 # Create Clickhouse table with String Array field (https://clickhouse.com/).
 # Write data to the table with scala Spark, via Clickhouse's JDBC 
([https://github.com/ClickHouse/clickhouse-java)] 
{code:java}
   // code extraction, will need to setup a Scala Spark job with clickhouse jdbc
    val clickHouseSchema = StructType(
      Seq(
        StructField("str_array", ArrayType(StringType))
      )
    )
    val data = Seq(
      Row(
        Seq("a", "b")
      )
    )

    val clickHouseDf = spark.createDataFrame(sc.parallelize(data), 
clickHouseSchema)
   
    val props = new Properties
    props.put("user", "default")
    clickHouseDf.write
      .mode(SaveMode.Append)
      .option("driver", com.clickhouse.jdbc.ClickHouseDriver)
      .jdbc("jdbc:clickhouse://localhost:8123/foo", table = "bar", props) {code}
h2. Fix
- [https://github.com/apache/spark/pull/44459] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to