[ https://issues.apache.org/jira/browse/SPARK-26990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bruce Robbins updated SPARK-26990: ---------------------------------- Summary: Difference in handling of mixed-case partition column names after SPARK-26188 (was: Difference in handling of mixed-case partition columns after SPARK-26188) > Difference in handling of mixed-case partition column names after SPARK-26188 > ----------------------------------------------------------------------------- > > Key: SPARK-26990 > URL: https://issues.apache.org/jira/browse/SPARK-26990 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.1 > Reporter: Bruce Robbins > Priority: Major > > I noticed that the [PR for > SPARK-26188|https://github.com/apache/spark/pull/23165] changed how > mixed-cased partition columns are handled when the user provides a schema. > Say I have this file structure (note that each instance of `pS` is mixed > case): > {noformat} > bash-3.2$ find partitioned5 -type d > partitioned5 > partitioned5/pi=2 > partitioned5/pi=2/pS=foo > partitioned5/pi=2/pS=bar > partitioned5/pi=1 > partitioned5/pi=1/pS=foo > partitioned5/pi=1/pS=bar > bash-3.2$ > {noformat} > If I load the file with a user-provided schema in 2.4 (before the PR was > committed) or 2.3, I see: > {noformat} > scala> val df = spark.read.schema("intField int, pi int, ps > string").parquet("partitioned5") > df: org.apache.spark.sql.DataFrame = [intField: int, pi: int ... 1 more field] > scala> df.printSchema > root > |-- intField: integer (nullable = true) > |-- pi: integer (nullable = true) > |-- ps: string (nullable = true) > scala> > {noformat} > However, using 2.4 after the PR was committed. I see: > {noformat} > scala> val df = spark.read.schema("intField int, pi int, ps > string").parquet("partitioned5") > df: org.apache.spark.sql.DataFrame = [intField: int, pi: int ... 1 more field] > scala> df.printSchema > root > |-- intField: integer (nullable = true) > |-- pi: integer (nullable = true) > |-- pS: string (nullable = true) > scala> > {noformat} > Spark is picking up the mixed-case column name {{pS}} from the directory > name, not the lower-case {{ps}} from my specified schema. > In all tests, {{spark.sql.caseSensitive}} is set to the default (false). > Not sure is this is an bug, but it is a difference. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org