[
https://issues.apache.org/jira/browse/SPARK-40439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605594#comment-17605594
]
Hyukjin Kwon commented on SPARK-40439:
--------------------------------------
Setting {{spark.sql.storeAssignmentPolicy}} to {{LEGACY}} should work.
> DECIMAL value with more precision than what is defined in the schema raises
> exception in SparkSQL but evaluates to NULL for DataFrame
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-40439
> URL: https://issues.apache.org/jira/browse/SPARK-40439
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.2.1
> Reporter: xsys
> Priority: Major
>
> h3. Describe the bug
> We are trying to store a DECIMAL value {{33333333333.2222222222}} with more
> precision than what is defined in the schema: {{{}DECIMAL(20,10){}}}. This
> leads to a {{NULL}} value being stored if the table is created using
> DataFrames via {{{}spark-shell{}}}. However, it leads to the following
> exception if the table is created via {{{}spark-sql{}}}:
> {code:java}
> Failed in [insert into decimal_extra_precision select 33333333333.2222222222]
> java.lang.ArithmeticException:
> Decimal(expanded,33333333333.2222222222,21,10}) cannot be represented as
> Decimal(20, 10){code}
> h3. Step to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 33333333333.2222222222;{code}
> Execute the following:
> {code:java}
> create table decimal_extra_precision(c1 DECIMAL(20,10)) STORED AS ORC;
> insert into decimal_extra_precision select 33333333333.2222222222;{code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}})
> to behave consistently for the same data type & input combination
> ({{{}DECIMAL(20,10){}}} and {{{}33333333333.2222222222{}}}).
> Here is a simplified example in {{{}spark-shell{}}}, where insertion of the
> aforementioned decimal value evaluates to a {{{}NULL{}}}:
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd =
> sc.parallelize(Seq(Row(BigDecimal("33333333333.2222222222"))))
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] =
> ParallelCollectionRDD[0] at parallelize at <console>:27
> scala> val schema = new StructType().add(StructField("c1", DecimalType(20,
> 10), true))
> schema: org.apache.spark.sql.types.StructType =
> StructType(StructField(c1,DecimalType(20,10),true))
> scala> val df = spark.createDataFrame(rdd, schema)
> df: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> scala> df.show()
> +----+
> | c1|
> +----+
> |null|
> +----+
> scala>
> df.write.mode("overwrite").format("orc").saveAsTable("decimal_extra_precision")
> 22/08/29 10:33:47 WARN SessionState: METASTORE_FILTER_HOOK will be ignored,
> since hive.security.authorization.manager is set to instance of
> HiveAuthorizerFactory.
> scala> spark.sql("select * from decimal_extra_precision;")
> res2: org.apache.spark.sql.DataFrame = [c1: decimal(20,10)]
> {code}
> h3. Root Cause
> The exception is being raised from
> [Decimal|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala#L358-L373]
> ({{{}nullOnOverflow{}}} is controlled by {{spark.sql.ansi.enabled}} in
> [SQLConf|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2542-L2551].):
> {code:java}
> private[sql] def toPrecision(
> precision: Int,
> scale: Int,
> roundMode: BigDecimal.RoundingMode.Value = ROUND_HALF_UP,
> nullOnOverflow: Boolean = true,
> context: SQLQueryContext = null): Decimal = {
> val copy = clone()
> if (copy.changePrecision(precision, scale, roundMode)) {
> copy
> } else {
> if (nullOnOverflow) {
> null
> } else {
> throw QueryExecutionErrors.cannotChangeDecimalPrecisionError(
> this, precision, scale, context)
> }
> }
> }{code}
> The above function is invoked from
> [toPrecision|https://github.com/apache/spark/blob/v3.2.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L754-L756]
> (in Cast.scala). However, our attempt to insert {{33333333333.2222222222}}
> after setting {{spark.sql.ansi.enabled}} _to {{False}}_ failed as well
> (which may be an independent issue).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]