[jira] [Closed] (SPARK-48965) toJSON produces wrong values if DecimalType information is lost in as[Product]

Dongjoon Hyun (Jira) Sun, 15 Feb 2026 11:25:12 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-48965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dongjoon Hyun closed SPARK-48965.
---------------------------------

> toJSON produces wrong values if DecimalType information is lost in as[Product]
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-48965
>                 URL: https://issues.apache.org/jira/browse/SPARK-48965
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.1, 3.5.1
>            Reporter: Dmitry Lapshin
>            Assignee: Bruce Robbins
>            Priority: Major
>              Labels: correctness, pull-request-available
>             Fix For: 3.4.4, 3.5.3, 4.0.0
>
>
> Consider this example:
> {code:scala}
> package com.jetbrains.jetstat.etl
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.types.DecimalType
> object A {
>   case class Example(x: BigDecimal)
>   def main(args: Array[String]): Unit = {
>     val spark = SparkSession.builder()
>       .master("local[1]")
>       .getOrCreate()
>     import spark.implicits._
>     val originalRaw = BigDecimal("123.456")
>     val original = Example(originalRaw)
>     val ds1 = spark.createDataset(Seq(original))
>     val ds2 = ds1
>       .withColumn("x", $"x" cast DecimalType(12, 6))
>     val ds3 = ds2
>       .as[Example]
>     println(s"DS1: schema=${ds1.schema}, 
> encoder.schema=${ds1.encoder.schema}")
>     println(s"DS2: schema=${ds1.schema}, 
> encoder.schema=${ds2.encoder.schema}")
>     println(s"DS3: schema=${ds1.schema}, 
> encoder.schema=${ds3.encoder.schema}")
>     val json1 = ds1.toJSON.collect().head
>     val json2 = ds2.toJSON.collect().head
>     val json3 = ds3.toJSON.collect().head
>     val collect1 = ds1.collect().head
>     val collect2_ = ds2.collect().head
>     val collect2 = collect2_.getDecimal(collect2_.fieldIndex("x"))
>     val collect3 = ds3.collect().head
>     println(s"Original: $original (scale = ${original.x.scale}, precision = 
> ${original.x.precision})")
>     println(s"Collect1: $collect1 (scale = ${collect1.x.scale}, precision = 
> ${collect1.x.precision})")
>     println(s"Collect2: $collect2 (scale = ${collect2.scale}, precision = 
> ${collect2.precision})")
>     println(s"Collect3: $collect3 (scale = ${collect3.x.scale}, precision = 
> ${collect3.x.precision})")
>     println(s"json1: $json1")
>     println(s"json2: $json2")
>     println(s"json3: $json3")
>   }
> }
> {code}
> Running it you'd see that json3 contains very much wrong data. After a bit of 
> debugging, and sorry since I'm bad with Spark internals, I've found that:
>  * In-memory representation of the data in this example used {{UnsafeRow}}, 
> whose {{.getDecimal}} uses compression to store small Decimal values as 
> longs, but doesn't remember decimal sizing parameters,
>  * However, there are at least two sources for precision & scale to pass to 
> that method: {{Dataset.schema}} (which is based on query execution, always 
> contains 38,18 for me) and {{Dataset.encoder.schema}} (that gets updated in 
> `ds2` to 12,6 but then is reset in `ds3`). Also, there is a 
> {{Dataset.deserializer}} that seems to be combining those two non-trivially.
>  * This doesn't seem to affect {{Dataset.collect()}} methods since they use 
> {{deserializer}}, but {{Dataset.toJSON}} only uses the first schema.
> Seems to me that either {{.toJSON}} should be more aware of what's going on 
> or {{.as[]}} should be doing something else.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Closed] (SPARK-48965) toJSON produces wrong values if DecimalType information is lost in as[Product]

Reply via email to