Tarique Anwer created SPARK-51945:
-------------------------------------

             Summary: Precision Increase from Decimal(28,20) to Decimal(29,20) 
When Rounding to 20 Decimal Places in Spark 3.5
                 Key: SPARK-51945
                 URL: https://issues.apache.org/jira/browse/SPARK-51945
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.5.0
            Reporter: Tarique Anwer


In Apache Spark 3.5 (and Databricks Runtime 15.4 LTS), the round function 
increases the precision of a Decimal(28,20) column to Decimal(29,20) when 
rounding to 20 decimal places. This behavior differs from Spark 3.2 (Databricks 
Runtime 10.4 LTS), where the output remains Decimal(28,20). The precision 
increase appears unnecessary and potentially a bug, as a Decimal(28,20) column 
cannot have a 21st decimal digit to trigger a carry-over requiring extra 
precision (e.g., 99999999.99999999999999999999 → 
100000000.00000000000000000000).

*Steps to Reproduce*
 * Create a DataFrame with Decimal(28,20) values, e.g., 99999999.9 to 
99999999.99999999999999999999 (scales 1 to 20).
 * Apply round(col, 20) to the column.
 * Check the output schema and values.

{*}Example Code{*}:

 

 
{code:java}
from pyspark.sql.functions import *
from pyspark.sql.types import DecimalType, IntegerType, StructType, StructField
from decimal import Decimal

schema = StructType(    [        StructField("input", DecimalType(28, 20), 
True),        StructField("scale", IntegerType(), True),    ])

df = spark.createDataFrame(    [        (Decimal("99999999.9"), 1),        
(Decimal("99999999.99999999999999999990"), 20),                
(Decimal("99999999.99999999999999999994"), 20),        
(Decimal("99999999.99999999999999999995"), 20),        
(Decimal("99999999.99999999999999999996"), 20),        
(Decimal("99999999.99999999999999999999"), 20),            ],    schema,)

df.printSchema() 

root
 |-- input: decimal(28,20) (nullable = true)
 |-- scale: integer (nullable = true)

df_1 = df.select(round(col("input"), 20).alias("input"))
df_1.printSchema()

root
 |-- input: decimal(29,20) (nullable = true)

df_2 = df.withColumn("input", round(col("input"), 20))
df_2.printSchema()

root
 |-- input: decimal(29,20) (nullable = true)
 |-- scale: integer (nullable = true){code}
 

 

I'm not entirely sure, but is this possibly related to 
https://issues.apache.org/jira/browse/SPARK-39226??



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to