[
https://issues.apache.org/jira/browse/SPARK-37191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-37191:
------------------------------------
Assignee: Apache Spark
> Allow merging DecimalTypes with different precision values
> -----------------------------------------------------------
>
> Key: SPARK-37191
> URL: https://issues.apache.org/jira/browse/SPARK-37191
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.3, 3.1.0, 3.1.1, 3.2.0
> Reporter: Ivan
> Assignee: Apache Spark
> Priority: Major
> Fix For: 3.3.0
>
>
> When merging DecimalTypes with different precision but the same scale, one
> would get the following error:
> {code:java}
> Failed to merge fields 'col' and 'col'. Failed to merge decimal types with
> incompatible precision 17 and 12 at
> org.apache.spark.sql.types.StructType$.$anonfun$merge$2(StructType.scala:652)
> at scala.Option.map(Option.scala:230)
> at
> org.apache.spark.sql.types.StructType$.$anonfun$merge$1(StructType.scala:644)
> at
> org.apache.spark.sql.types.StructType$.$anonfun$merge$1$adapted(StructType.scala:641)
> at
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
> at
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
> at org.apache.spark.sql.types.StructType$.merge(StructType.scala:641)
> at org.apache.spark.sql.types.StructType.merge(StructType.scala:550)
> {code}
>
> We could allow merging DecimalType values with different precision if the
> scale is the same for both types since there should not be any data
> correctness issues as one of the types will be extended, for example,
> DECIMAL(12, 2) -> DECIMAL(17, 2); however, this is not the case for upcasting
> when the scale is different - this would depend on the actual values.
>
> Repro code:
> {code:java}
> import org.apache.spark.sql.types._
> val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
> val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
> schema1.merge(schema2) {code}
>
> This also affects Parquet schema merge which is where this issue was
> discovered originally:
> {code:java}
> import java.math.BigDecimal
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types._
> val data1 = sc.parallelize(Row(new BigDecimal("1234567890000.11")) :: Nil, 1)
> val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
> val data2 = sc.parallelize(Row(new BigDecimal("123456789.11")) :: Nil, 1)
> val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
> spark.createDataFrame(data2,
> schema2).write.parquet("/tmp/decimal-test.parquet")
> spark.createDataFrame(data1,
> schema1).write.mode("append").parquet("/tmp/decimal-test.parquet")
> // Reading the DataFrame fails
> spark.read.option("mergeSchema",
> "true").parquet("/tmp/decimal-test.parquet").show()
> >>>
> Failed merging schema:
> root
> |-- col: decimal(17,2) (nullable = true)
> Caused by: Failed to merge fields 'col' and 'col'. Failed to merge decimal
> types with incompatible precision 12 and 17
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]