[ 
https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276612#comment-17276612
 ] 

Apache Spark commented on SPARK-34314:
--------------------------------------

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/31423

> Wrong discovered partition value
> --------------------------------
>
>                 Key: SPARK-34314
>                 URL: https://issues.apache.org/jira/browse/SPARK-34314
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Maxim Gekk
>            Priority: Major
>
> The example below portraits the issue:
> {code:scala}
>       val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
>       df.write
>         .partitionBy("part")
>         .format("parquet")
>         .save(path)
>       val readback = spark.read.parquet(path)
>       readback.printSchema()
>       readback.show(false)
> {code}
> It write the partition value as string:
> {code}
> /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tc0000gn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
> ├── _SUCCESS
> ├── part=-0
> │   └── part-00001-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> └── part=AA
>     └── part-00000-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> {code}
> *"-0"* and "AA".
> but when Spark reads data back, it transforms "-0" to "0"
> {code}
> root
>  |-- id: integer (nullable = true)
>  |-- part: string (nullable = true)
> +---+----+
> |id |part|
> +---+----+
> |0  |AA  |
> |1  |0   |
> +---+----+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to