[
https://issues.apache.org/jira/browse/SPARK-34133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erik Krogen updated SPARK-34133:
--------------------------------
Summary: [AVRO] Respect case sensitivity when performing Catalyst-to-Avro
field matching (was: [AVRO] Respect case sensitivity when performing
Catalyst-to-Avro field matching and enhance error messages)
> [AVRO] Respect case sensitivity when performing Catalyst-to-Avro field
> matching
> -------------------------------------------------------------------------------
>
> Key: SPARK-34133
> URL: https://issues.apache.org/jira/browse/SPARK-34133
> Project: Spark
> Issue Type: Bug
> Components: Input/Output, SQL
> Affects Versions: 2.4.0, 3.2.0
> Reporter: Erik Krogen
> Priority: Major
>
> Spark SQL is normally case-insensitive (by default), but currently when
> {{AvroSerializer}} and {{AvroDeserializer}} perform matching between Catalyst
> schemas and Avro schemas, the matching is done in a case-sensitive manner. So
> for example the following will fail:
> {code}
> val avroSchema =
> """
> |{
> | "type" : "record",
> | "name" : "test_schema",
> | "fields" : [
> | {"name": "foo", "type": "int"},
> | {"name": "BAR", "type": "int"}
> | ]
> |}
> """.stripMargin
> val df = Seq((1, 3), (2, 4)).toDF("FOO", "bar")
> df.write.option("avroSchema", avroSchema).format("avro").save(savePath)
> {code}
> The same is true on the read path, if we assume {{testAvro}} has been written
> using the schema above, the below will fail to match the fields:
> {code}
> df.read.schema(new StructType().add("FOO", IntegerType).add("bar",
> IntegerType))
> .format("avro").load(testAvro)
> {code}
> In addition the error messages in this type of failure scenario are very
> lacking in information on the write path ({{AvroSerializer}}), we can make
> them much more helpful for users to debug schema issues.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]