LONG type errors out in DataFrame but evaluates to a rounded value in SparkSQL

Pablo Langa Blanco (Jira) Sun, 26 Feb 2023 14:58:06 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-40525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693721#comment-17693721
 ]


Pablo Langa Blanco edited comment on SPARK-40525 at 2/26/23 10:57 PM:
----------------------------------------------------------------------

Hi [~x/sys] ,

When you are working with Spark Sql interface you can configure the behavior 
and you have 3 policies for type coercion rules. 
([https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html)] If you 
set "strict" in spark.sql.storeAssignmentPolicy it's going to happen what you 
expect, but it's not the policy by default.

I hope it help you.


was (Author: planga82):
Hi [~x/sys] ,

When you are working with Spark Sql interface you can configure the behavior 
and you have 3 policies for type coercion rules. 
([https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html)] If you 
set "strict" in spark.sql.storeAssignmentPolicy it's going to happen what you 
expected, but it's not the policy by default.

I hope it help you.

> Floating-point value with an INT/BYTE/SHORT/LONG type errors out in DataFrame 
> but evaluates to a rounded value in SparkSQL
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-40525
>                 URL: https://issues.apache.org/jira/browse/SPARK-40525
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.1
>            Reporter: xsys
>            Priority: Major
>
> h3. Describe the bug
> Storing an invalid INT value {{1.1}} using DataFrames via {{spark-shell}} 
> expectedly errors out. However, it is evaluated to a rounded value {{1}} if 
> the value is inserted into the table via {{{}spark-sql{}}}.
> h3. Steps to reproduce:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> $SPARK_HOME/bin/spark-sql {code}
> Execute the following:
> {code:java}
> spark-sql> create table int_floating_point_vals(c1 INT) stored as ORC;
> 22/09/19 16:49:11 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> Time taken: 0.216 seconds
> spark-sql> insert into int_floating_point_vals select 1.1;
> Time taken: 1.747 seconds
> spark-sql> select * from int_floating_point_vals;
> 1
> Time taken: 0.518 seconds, Fetched 1 row(s){code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination 
> ({{{}INT{}}} and {{{}1.1{}}}).
> h4. Here is a simplified example in {{{}spark-shell{}}}, where insertion of 
> the aforementioned value correctly raises an exception:
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-shell{}}}:
> {code:java}
> $SPARK_HOME/bin/spark-shell{code}
> Execute the following:
> {code:java}
> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.types._
> val rdd = sc.parallelize(Seq(Row(1.1)))
> val schema = new StructType().add(StructField("c1", IntegerType, true))
> val df = spark.createDataFrame(rdd, schema)
> df.write.mode("overwrite").format("orc").saveAsTable("int_floating_point_vals")
>  {code}
> The following exception is raised:
> {code:java}
> java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
> java.lang.Double is not a valid external type for schema of int{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-40525) Floating-point value with an INT/BYTE/SHORT/LONG type errors out in DataFrame but evaluates to a rounded value in SparkSQL

Reply via email to