[ https://issues.apache.org/jira/browse/SPARK-40525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693721#comment-17693721 ]
Pablo Langa Blanco edited comment on SPARK-40525 at 2/26/23 10:57 PM: ---------------------------------------------------------------------- Hi [~x/sys] , When you are working with Spark Sql interface you can configure the behavior and you have 3 policies for type coercion rules. ([https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html)] If you set "strict" in spark.sql.storeAssignmentPolicy it's going to happen what you expect, but it's not the policy by default. I hope it help you. was (Author: planga82): Hi [~x/sys] , When you are working with Spark Sql interface you can configure the behavior and you have 3 policies for type coercion rules. ([https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html)] If you set "strict" in spark.sql.storeAssignmentPolicy it's going to happen what you expected, but it's not the policy by default. I hope it help you. > Floating-point value with an INT/BYTE/SHORT/LONG type errors out in DataFrame > but evaluates to a rounded value in SparkSQL > -------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-40525 > URL: https://issues.apache.org/jira/browse/SPARK-40525 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.1 > Reporter: xsys > Priority: Major > > h3. Describe the bug > Storing an invalid INT value {{1.1}} using DataFrames via {{spark-shell}} > expectedly errors out. However, it is evaluated to a rounded value {{1}} if > the value is inserted into the table via {{{}spark-sql{}}}. > h3. Steps to reproduce: > On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}: > {code:java} > $SPARK_HOME/bin/spark-sql {code} > Execute the following: > {code:java} > spark-sql> create table int_floating_point_vals(c1 INT) stored as ORC; > 22/09/19 16:49:11 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, > since hive.security.authorization.manager is set to instance of > HiveAuthorizerFactory. > Time taken: 0.216 seconds > spark-sql> insert into int_floating_point_vals select 1.1; > Time taken: 1.747 seconds > spark-sql> select * from int_floating_point_vals; > 1 > Time taken: 0.518 seconds, Fetched 1 row(s){code} > h3. Expected behavior > We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) > to behave consistently for the same data type & input combination > ({{{}INT{}}} and {{{}1.1{}}}). > h4. Here is a simplified example in {{{}spark-shell{}}}, where insertion of > the aforementioned value correctly raises an exception: > On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-shell{}}}: > {code:java} > $SPARK_HOME/bin/spark-shell{code} > Execute the following: > {code:java} > import org.apache.spark.sql.{Row, SparkSession} > import org.apache.spark.sql.types._ > val rdd = sc.parallelize(Seq(Row(1.1))) > val schema = new StructType().add(StructField("c1", IntegerType, true)) > val df = spark.createDataFrame(rdd, schema) > df.write.mode("overwrite").format("orc").saveAsTable("int_floating_point_vals") > {code} > The following exception is raised: > {code:java} > java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: > java.lang.Double is not a valid external type for schema of int{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org