[ https://issues.apache.org/jira/browse/SPARK-32431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164499#comment-17164499 ]
Maxim Gekk commented on SPARK-32431: ------------------------------------ I am working on this. > The .schema() API behaves incorrectly for nested schemas that have column > duplicates in case-insensitive mode > ------------------------------------------------------------------------------------------------------------- > > Key: SPARK-32431 > URL: https://issues.apache.org/jira/browse/SPARK-32431 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.6, 3.0.0 > Reporter: Michał Świtakowski > Priority: Major > > The code below throws org.apache.spark.sql.AnalysisException: Found duplicate > column(s) in the data schema: `camelcase`; for multiple file formats due to a > duplicate column in the requested schema. > {code:java} > import org.apache.spark.sql.types._ > spark.conf.set("spark.sql.caseSensitive", "false") > val formats = Seq("parquet", "orc", "avro", "json") > val caseInsensitiveSchema = new StructType().add("LowerCase", > LongType).add("camelcase", LongType).add("CamelCase", LongType) > formats.map{ format => > val path = s"/tmp/$format" > spark > .range(1L) > .selectExpr("id AS lowercase", "id + 1 AS camelCase") > .write.mode("overwrite").format(format).save(path) > spark.read.schema(caseInsensitiveSchema).format(format).load(path).show > } > {code} > Similar code with nested schema behaves inconsistently across file formats > and sometimes returns incorrect results: > {code:java} > import org.apache.spark.sql.types._ > spark.conf.set("spark.sql.caseSensitive", "false")val formats = > Seq("parquet", "orc", "avro", "json")val caseInsensitiveSchema = new > StructType().add("StructColumn", new StructType().add("LowerCase", > LongType).add("camelcase", LongType).add("CamelCase", LongType))formats.map{ > format => > val path = s"/tmp/$format" > spark > .range(1L) > .selectExpr("NAMED_STRUCT('lowercase', id, 'camelCase', id + 1) AS > StructColumn") > .write.mode("overwrite").format(format).save(path) > > spark.read.schema(caseInsensitiveSchema).format(format).load(path).show > } > {code} > The desired behavior likely should be returning an exception just like in the > flat schema scenario. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org