[ https://issues.apache.org/jira/browse/SPARK-32147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shankar Koirala updated SPARK-32147: ------------------------------------ Description: While saving dataframe as parquet or csv with partitionBy column having 'f' and 'd' with numbers are changing the values. Below is the example {code:java} scala> val df = Seq( | ("9q", 1), | ("3k", 2), | ("6f", 3), | ("7f", 4), | ("7d", 5) | ).toDF("value", "id") df: org.apache.spark.sql.DataFrame = [value: string, id: int] scala> df.show(false) +-----+---+ |value|id | +-----+---+ | 9q | 1 | | 3k | 2 | | 6f | 3 | | 7f | 4 | | 7d | 5 | +-----+---+ scala> df.write.partitionBy("value").mode(SaveMode.Overwrite).parquet("tmp_parquet") scala> spark.read.parquet("tmp_parquet").show(false) +---+-----+ |id |value| +---+-----+ |5 | 7.0 | |3 | 6.0 | |2 | 3k | |4 | 7.0 | |1 | 9q | +---+-----+ {code} Same with the other format too, Is this a bug or is it normal. Taken from [SO|[https://stackoverflow.com/questions/62671684/spark-incorrectly-intepret-partition-name-ending-with-d-or-f-as-number-when]] was: While saving dataframe as parquet or csv with partitionBy column having 'f' and 'd' with numbers are changing the values. Below is the example {code:java} scala> val df = Seq( | ("9q", 1), | ("3k", 2), | ("6f", 3), | ("7f", 4), | ("7d", 5) | ).toDF("value", "id") df: org.apache.spark.sql.DataFrame = [value: string, id: int] scala> df.show(false) +-----+---+ |value|id | +-----+---+ | 9q | 1 | | 3k | 2 | | 6f | 3 | | 7f | 4 | | 7d | 5 | +-----+---+ scala> df.write.partitionBy("value").mode(SaveMode.Overwrite).parquet("tmp_parquet") scala> spark.read.parquet("tmp_parquet").show(false) +---+-----+ |id |value| +---+-----+ |5 | 7.0 | |3 | 6.0 | |2 | 3k | |4 | 7.0 | |1 | 9q | +---+-----+ {code} Same with the other format too, Is this a bug or is it normal. Taken from [SO|[https://stackoverflow.com/questions/62671684/spark-incorrectly-intepret-partition-name-ending-with-d-or-f-as-number-when]] > Spark: PartitionBy changing the columns value > ---------------------------------------------- > > Key: SPARK-32147 > URL: https://issues.apache.org/jira/browse/SPARK-32147 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell > Affects Versions: 3.0.0 > Reporter: Shankar Koirala > Priority: Major > > While saving dataframe as parquet or csv with partitionBy column having 'f' > and 'd' with numbers are changing the values. > Below is the example > {code:java} > scala> val df = Seq( > | ("9q", 1), > | ("3k", 2), > | ("6f", 3), > | ("7f", 4), > | ("7d", 5) > | ).toDF("value", "id") > df: org.apache.spark.sql.DataFrame = [value: string, id: int] > scala> df.show(false) > +-----+---+ > |value|id | > +-----+---+ > | 9q | 1 | > | 3k | 2 | > | 6f | 3 | > | 7f | 4 | > | 7d | 5 | > +-----+---+ > scala> > df.write.partitionBy("value").mode(SaveMode.Overwrite).parquet("tmp_parquet") > scala> spark.read.parquet("tmp_parquet").show(false) > +---+-----+ > |id |value| > +---+-----+ > |5 | 7.0 | > |3 | 6.0 | > |2 | 3k | > |4 | 7.0 | > |1 | 9q | > +---+-----+ > {code} > Same with the other format too, Is this a bug or is it normal. > Taken from > [SO|[https://stackoverflow.com/questions/62671684/spark-incorrectly-intepret-partition-name-ending-with-d-or-f-as-number-when]] > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org