[
https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan resolved SPARK-34314.
---------------------------------
Fix Version/s: 3.2.0
Resolution: Fixed
Issue resolved by pull request 31549
[https://github.com/apache/spark/pull/31549]
> Wrong discovered partition value
> --------------------------------
>
> Key: SPARK-34314
> URL: https://issues.apache.org/jira/browse/SPARK-34314
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0
> Reporter: Maxim Gekk
> Assignee: Maxim Gekk
> Priority: Major
> Fix For: 3.2.0
>
>
> The example below portraits the issue:
> {code:scala}
> val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
> df.write
> .partitionBy("part")
> .format("parquet")
> .save(path)
> val readback = spark.read.parquet(path)
> readback.printSchema()
> readback.show(false)
> {code}
> It write the partition value as string:
> {code}
> /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tc0000gn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
> ├── _SUCCESS
> ├── part=-0
> │ └── part-00001-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> └── part=AA
> └── part-00000-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> {code}
> *"-0"* and "AA".
> but when Spark reads data back, it transforms "-0" to "0"
> {code}
> root
> |-- id: integer (nullable = true)
> |-- part: string (nullable = true)
> +---+----+
> |id |part|
> +---+----+
> |0 |AA |
> |1 |0 |
> +---+----+
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]