[ 
https://issues.apache.org/jira/browse/SPARK-37191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan updated SPARK-37191:
-------------------------
    Description: 
When merging DecimalTypes with different precision but the same scale, one 
would get the following error:
{code:java}
Failed to merge fields 'col' and 'col'. Failed to merge decimal types with 
incompatible precision 17 and 12     at 
org.apache.spark.sql.types.StructType$.$anonfun$merge$2(StructType.scala:652)
        at scala.Option.map(Option.scala:230)
        at 
org.apache.spark.sql.types.StructType$.$anonfun$merge$1(StructType.scala:644)
        at 
org.apache.spark.sql.types.StructType$.$anonfun$merge$1$adapted(StructType.scala:641)
        at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
        at org.apache.spark.sql.types.StructType$.merge(StructType.scala:641)
        at org.apache.spark.sql.types.StructType.merge(StructType.scala:550) 
{code}
 

We could allow merging DecimalType values with different precision if the scale 
is the same for both types since there should not be any data correctness 
issues as one of the types will be extended, for example, DECIMAL(12, 2) -> 
DECIMAL(17, 2); however, this is not the case for upcasting when the scale is 
different - this would depend on the actual values.

 

Repro code:
{code:java}
import org.apache.spark.sql.types._

val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
schema1.merge(schema2) {code}
 

This also affects Parquet schema merge which is where this issue was discovered 
originally:
{code:java}
import java.math.BigDecimal
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._

val data1 = sc.parallelize(Row(new BigDecimal("1234567890000.11")) :: Nil, 1)
val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)

val data2 = sc.parallelize(Row(new BigDecimal("123456789.11")) :: Nil, 1)
val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)

spark.createDataFrame(data2, schema2).write.parquet("/tmp/decimal-test.parquet")
spark.createDataFrame(data1, 
schema1).write.mode("append").parquet("/tmp/decimal-test.parquet")

// Reading the DataFrame fails
spark.read.option("mergeSchema", 
"true").parquet("/mnt/ivan/decimal-test.parquet").show()

>>>
Failed merging schema:
root
 |-- col: decimal(17,2) (nullable = true)

Caused by: Failed to merge fields 'col' and 'col'. Failed to merge decimal 
types with incompatible precision 12 and 17



{code}
 

> Allow merging DecimalTypes with different precision values 
> -----------------------------------------------------------
>
>                 Key: SPARK-37191
>                 URL: https://issues.apache.org/jira/browse/SPARK-37191
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.3, 3.1.0, 3.1.1, 3.2.0
>            Reporter: Ivan
>            Priority: Major
>             Fix For: 3.3.0
>
>
> When merging DecimalTypes with different precision but the same scale, one 
> would get the following error:
> {code:java}
> Failed to merge fields 'col' and 'col'. Failed to merge decimal types with 
> incompatible precision 17 and 12   at 
> org.apache.spark.sql.types.StructType$.$anonfun$merge$2(StructType.scala:652)
>       at scala.Option.map(Option.scala:230)
>       at 
> org.apache.spark.sql.types.StructType$.$anonfun$merge$1(StructType.scala:644)
>       at 
> org.apache.spark.sql.types.StructType$.$anonfun$merge$1$adapted(StructType.scala:641)
>       at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>       at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>       at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
>       at org.apache.spark.sql.types.StructType$.merge(StructType.scala:641)
>       at org.apache.spark.sql.types.StructType.merge(StructType.scala:550) 
> {code}
>  
> We could allow merging DecimalType values with different precision if the 
> scale is the same for both types since there should not be any data 
> correctness issues as one of the types will be extended, for example, 
> DECIMAL(12, 2) -> DECIMAL(17, 2); however, this is not the case for upcasting 
> when the scale is different - this would depend on the actual values.
>  
> Repro code:
> {code:java}
> import org.apache.spark.sql.types._
> val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
> val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
> schema1.merge(schema2) {code}
>  
> This also affects Parquet schema merge which is where this issue was 
> discovered originally:
> {code:java}
> import java.math.BigDecimal
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types._
> val data1 = sc.parallelize(Row(new BigDecimal("1234567890000.11")) :: Nil, 1)
> val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
> val data2 = sc.parallelize(Row(new BigDecimal("123456789.11")) :: Nil, 1)
> val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
> spark.createDataFrame(data2, 
> schema2).write.parquet("/tmp/decimal-test.parquet")
> spark.createDataFrame(data1, 
> schema1).write.mode("append").parquet("/tmp/decimal-test.parquet")
> // Reading the DataFrame fails
> spark.read.option("mergeSchema", 
> "true").parquet("/mnt/ivan/decimal-test.parquet").show()
> >>>
> Failed merging schema:
> root
>  |-- col: decimal(17,2) (nullable = true)
> Caused by: Failed to merge fields 'col' and 'col'. Failed to merge decimal 
> types with incompatible precision 12 and 17
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to