[
https://issues.apache.org/jira/browse/SPARK-40624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
xsys updated SPARK-40624:
-------------------------
Description:
h3. Describe the bug
Storing an invalid value (e.g. {{{}BigDecimal("1.0/0"){}}}) via {{spark-shell}}
errors out during RDD creation. However, {{1.0/0}} evaluates to {{NULL}} if the
value is inserted into a {{DECIMAL(20,10)}} column of a table via
{{{}spark-sql{}}}.
h3. To Reproduce
On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
{code:java}
$SPARK_HOME/bin/spark-sql{code}
Execute the following: (evaluated to {{{}NULL{}}})
{code:java}
spark-sql> create table decimal_vals(c1 DECIMAL(20,10)) stored as ORC;
spark-sql> insert into decimal_vals select 1.0/0;
spark-sql> select * from decimal_vals;
NULL{code}
Using {{{}spark-shell{}}}:
{code:java}
$SPARK_HOME/bin/spark-shell{code}
Execute the following: (errors out during RDD creation)
{code:java}
scala> import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.{Row, SparkSession}
scala> import org.apache.spark.sql.types._
import org.apache.spark.sql.types._
scala> val rdd = sc.parallelize(Seq(Row(BigDecimal("1.0/0"))))
java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:497)
at java.math.BigDecimal.<init>(BigDecimal.java:383)
at java.math.BigDecimal.<init>(BigDecimal.java:809)
at scala.math.BigDecimal$.exact(BigDecimal.scala:126)
at scala.math.BigDecimal$.apply(BigDecimal.scala:284)
... 49 elided{code}
h3. Expected behavior
We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) to
behave consistently for the same data type & input combination
({{{}BigDecimal{}}}/{{{}DECIMAL(20,10){}}} and {{{}1.0/0{}}}).
was:
h3. Describe the bug
Storing an invalid value (e.g. {{{}BigDecimal("1.0/0"){}}}) via {{spark-shell}}
errors out during RDD creation. However, {{1.0/0}} evaluates to {{NULL}} if the
value is inserted into a {{DECIMAL(20,10)}} column of a table via
{{{}spark-sql{}}}.
h3. To Reproduce
On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
{code:java}
$SPARK_HOME/bin/spark-sql{code}
Execute the following: (evaluated to {{{}NULL{}}})
{code:java}
spark-sql> create table decimal_vals(c1 DECIMAL(20,10)) stored as ORC;
spark-sql> insert into decimal_vals select 1.0/0;
spark-sql> select * from decimal_vals;
NULL{code}
Using {{{}spark-shell{}}}:
{code:java}
$SPARK_HOME/bin/spark-shell{code}
Execute the following: (errors out during RDD creation)
{code:java}
scala> import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.{Row, SparkSession}scala> import
org.apache.spark.sql.types._
import org.apache.spark.sql.types._
scala> val rdd = sc.parallelize(Seq(Row(BigDecimal("1.0/0"))))
java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:497)
at java.math.BigDecimal.<init>(BigDecimal.java:383)
at java.math.BigDecimal.<init>(BigDecimal.java:809)
at scala.math.BigDecimal$.exact(BigDecimal.scala:126)
at scala.math.BigDecimal$.apply(BigDecimal.scala:284)
... 49 elided{code}
h3. Expected behavior
We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) to
behave consistently for the same data type & input combination
({{{}BigDecimal{}}}/{{{}DECIMAL(20,10){}}} and {{{}1.0/0{}}}).
> A DECIMAL value with division by 0 errors in DataFrame but evaluates to NULL
> in SparkSQL
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-40624
> URL: https://issues.apache.org/jira/browse/SPARK-40624
> Project: Spark
> Issue Type: Bug
> Components: Spark Shell
> Affects Versions: 3.2.1
> Reporter: xsys
> Priority: Major
>
> h3. Describe the bug
> Storing an invalid value (e.g. {{{}BigDecimal("1.0/0"){}}}) via
> {{spark-shell}} errors out during RDD creation. However, {{1.0/0}} evaluates
> to {{NULL}} if the value is inserted into a {{DECIMAL(20,10)}} column of a
> table via {{{}spark-sql{}}}.
> h3. To Reproduce
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> $SPARK_HOME/bin/spark-sql{code}
> Execute the following: (evaluated to {{{}NULL{}}})
> {code:java}
> spark-sql> create table decimal_vals(c1 DECIMAL(20,10)) stored as ORC;
> spark-sql> insert into decimal_vals select 1.0/0;
> spark-sql> select * from decimal_vals;
> NULL{code}
> Using {{{}spark-shell{}}}:
> {code:java}
> $SPARK_HOME/bin/spark-shell{code}
> Execute the following: (errors out during RDD creation)
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd = sc.parallelize(Seq(Row(BigDecimal("1.0/0"))))
> java.lang.NumberFormatException
> at java.math.BigDecimal.<init>(BigDecimal.java:497)
> at java.math.BigDecimal.<init>(BigDecimal.java:383)
> at java.math.BigDecimal.<init>(BigDecimal.java:809)
> at scala.math.BigDecimal$.exact(BigDecimal.scala:126)
> at scala.math.BigDecimal$.apply(BigDecimal.scala:284)
> ... 49 elided{code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}})
> to behave consistently for the same data type & input combination
> ({{{}BigDecimal{}}}/{{{}DECIMAL(20,10){}}} and {{{}1.0/0{}}}).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]