[
https://issues.apache.org/jira/browse/SPARK-37191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan updated SPARK-37191:
-------------------------
Description:
When merging DecimalTypes with different precision but the same scale, one
would get the following error:
{code:java}
Failed to merge fields 'col' and 'col'. Failed to merge decimal types with
incompatible precision 17 and 12 at
org.apache.spark.sql.types.StructType$.$anonfun$merge$2(StructType.scala:652)
at scala.Option.map(Option.scala:230)
at
org.apache.spark.sql.types.StructType$.$anonfun$merge$1(StructType.scala:644)
at
org.apache.spark.sql.types.StructType$.$anonfun$merge$1$adapted(StructType.scala:641)
at
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at org.apache.spark.sql.types.StructType$.merge(StructType.scala:641)
at org.apache.spark.sql.types.StructType.merge(StructType.scala:550)
{code}
We could allow merging DecimalType values with different precision if the scale
is the same for both types since there should not be any data correctness
issues as one of the types will be extended, for example, DECIMAL(12, 2) ->
DECIMAL(17, 2); however, this is not the case for upcasting when the scale is
different - this would depend on the actual values.
Repro code:
{code:java}
import org.apache.spark.sql.types._
val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
schema1.merge(schema2) {code}
This also affects Parquet schema merge which is where this issue was discovered
originally:
{code:java}
import java.math.BigDecimal
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
val data1 = sc.parallelize(Row(new BigDecimal("1234567890000.11")) :: Nil, 1)
val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
val data2 = sc.parallelize(Row(new BigDecimal("123456789.11")) :: Nil, 1)
val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
spark.createDataFrame(data2, schema2).write.parquet("/tmp/decimal-test.parquet")
spark.createDataFrame(data1,
schema1).write.mode("append").parquet("/tmp/decimal-test.parquet")
// Reading the DataFrame fails
spark.read.option("mergeSchema",
"true").parquet("/mnt/ivan/decimal-test.parquet").show()
>>>
Failed merging schema:
root
|-- col: decimal(17,2) (nullable = true)
Caused by: Failed to merge fields 'col' and 'col'. Failed to merge decimal
types with incompatible precision 12 and 17
{code}
> Allow merging DecimalTypes with different precision values
> -----------------------------------------------------------
>
> Key: SPARK-37191
> URL: https://issues.apache.org/jira/browse/SPARK-37191
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.3, 3.1.0, 3.1.1, 3.2.0
> Reporter: Ivan
> Priority: Major
> Fix For: 3.3.0
>
>
> When merging DecimalTypes with different precision but the same scale, one
> would get the following error:
> {code:java}
> Failed to merge fields 'col' and 'col'. Failed to merge decimal types with
> incompatible precision 17 and 12 at
> org.apache.spark.sql.types.StructType$.$anonfun$merge$2(StructType.scala:652)
> at scala.Option.map(Option.scala:230)
> at
> org.apache.spark.sql.types.StructType$.$anonfun$merge$1(StructType.scala:644)
> at
> org.apache.spark.sql.types.StructType$.$anonfun$merge$1$adapted(StructType.scala:641)
> at
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
> at
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
> at org.apache.spark.sql.types.StructType$.merge(StructType.scala:641)
> at org.apache.spark.sql.types.StructType.merge(StructType.scala:550)
> {code}
>
> We could allow merging DecimalType values with different precision if the
> scale is the same for both types since there should not be any data
> correctness issues as one of the types will be extended, for example,
> DECIMAL(12, 2) -> DECIMAL(17, 2); however, this is not the case for upcasting
> when the scale is different - this would depend on the actual values.
>
> Repro code:
> {code:java}
> import org.apache.spark.sql.types._
> val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
> val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
> schema1.merge(schema2) {code}
>
> This also affects Parquet schema merge which is where this issue was
> discovered originally:
> {code:java}
> import java.math.BigDecimal
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types._
> val data1 = sc.parallelize(Row(new BigDecimal("1234567890000.11")) :: Nil, 1)
> val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
> val data2 = sc.parallelize(Row(new BigDecimal("123456789.11")) :: Nil, 1)
> val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
> spark.createDataFrame(data2,
> schema2).write.parquet("/tmp/decimal-test.parquet")
> spark.createDataFrame(data1,
> schema1).write.mode("append").parquet("/tmp/decimal-test.parquet")
> // Reading the DataFrame fails
> spark.read.option("mergeSchema",
> "true").parquet("/mnt/ivan/decimal-test.parquet").show()
> >>>
> Failed merging schema:
> root
> |-- col: decimal(17,2) (nullable = true)
> Caused by: Failed to merge fields 'col' and 'col'. Failed to merge decimal
> types with incompatible precision 12 and 17
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]