Tarique Anwer created SPARK-51945: ------------------------------------- Summary: Precision Increase from Decimal(28,20) to Decimal(29,20) When Rounding to 20 Decimal Places in Spark 3.5 Key: SPARK-51945 URL: https://issues.apache.org/jira/browse/SPARK-51945 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.5.0 Reporter: Tarique Anwer
In Apache Spark 3.5 (and Databricks Runtime 15.4 LTS), the round function increases the precision of a Decimal(28,20) column to Decimal(29,20) when rounding to 20 decimal places. This behavior differs from Spark 3.2 (Databricks Runtime 10.4 LTS), where the output remains Decimal(28,20). The precision increase appears unnecessary and potentially a bug, as a Decimal(28,20) column cannot have a 21st decimal digit to trigger a carry-over requiring extra precision (e.g., 99999999.99999999999999999999 → 100000000.00000000000000000000). *Steps to Reproduce* * Create a DataFrame with Decimal(28,20) values, e.g., 99999999.9 to 99999999.99999999999999999999 (scales 1 to 20). * Apply round(col, 20) to the column. * Check the output schema and values. {*}Example Code{*}: {code:java} from pyspark.sql.functions import * from pyspark.sql.types import DecimalType, IntegerType, StructType, StructField from decimal import Decimal schema = StructType( [ StructField("input", DecimalType(28, 20), True), StructField("scale", IntegerType(), True), ]) df = spark.createDataFrame( [ (Decimal("99999999.9"), 1), (Decimal("99999999.99999999999999999990"), 20), (Decimal("99999999.99999999999999999994"), 20), (Decimal("99999999.99999999999999999995"), 20), (Decimal("99999999.99999999999999999996"), 20), (Decimal("99999999.99999999999999999999"), 20), ], schema,) df.printSchema() root |-- input: decimal(28,20) (nullable = true) |-- scale: integer (nullable = true) df_1 = df.select(round(col("input"), 20).alias("input")) df_1.printSchema() root |-- input: decimal(29,20) (nullable = true) df_2 = df.withColumn("input", round(col("input"), 20)) df_2.printSchema() root |-- input: decimal(29,20) (nullable = true) |-- scale: integer (nullable = true){code} I'm not entirely sure, but is this possibly related to https://issues.apache.org/jira/browse/SPARK-39226?? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org