[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112368#comment-17112368 ]
Sandeep Katta commented on SPARK-31761: --------------------------------------- [~sowen] [~hyukjin.kwon] I have executed the same query in spark-2.4.4 it works as per expectation. As you can see from 2.4.4 plan, columns are casted to double, so there won't be *Integer overflow.* == Parsed Logical Plan == 'Project [cast(('col0 / 'col1) as bigint) AS CAST((col0 / col1) AS BIGINT)#4] +- Relation[col0#0,col1#1] csv == Analyzed Logical Plan == CAST((col0 / col1) AS BIGINT): bigint Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) AS CAST((col0 / col1) AS BIGINT)#4L] +- Relation[col0#0,col1#1] csv == Optimized Logical Plan == Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) AS CAST((col0 / col1) AS BIGINT)#4L] +- Relation[col0#0,col1#1] csv == Physical Plan == *(1) Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) AS CAST((col0 / col1) AS BIGINT)#4L] +- *(1) FileScan csv [col0#0,col1#1] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/opt/fordebug/divTest.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<col0:int,col1:int> *(1) Project [cast((cast(col0#0 as double) / cast(col1#1 as double)) as bigint) AS CAST((col0 / col1) AS BIGINT)#4L] +- *(1) FileScan csv [col0#0,col1#1] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/opt/fordebug/divTest.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<col0:int,col1:int> Spark-3.0 == Parsed Logical Plan == 'Project [('col0 div 'col1) AS (col0 div col1)#4] +- RelationV2[col0#0, col1#1] csv file:/opt/fordebug/divTest.csv == Analyzed Logical Plan == (col0 div col1): int Project [(col0#0 div col1#1) AS (col0 div col1)#4] +- RelationV2[col0#0, col1#1] csv file:/opt/fordebug/divTest.csv == Optimized Logical Plan == Project [(col0#0 div col1#1) AS (col0 div col1)#4] +- RelationV2[col0#0, col1#1] csv file:/opt/fordebug/divTest.csv == Physical Plan == *(1) Project [(col0#0 div col1#1) AS (col0 div col1)#4] +- BatchScan[col0#0, col1#1] CSVScan Location: InMemoryFileIndex[file:/opt/fordebug/divTest.csv], ReadSchema: struct<col0:int,col1:int> In Spark3 do I need to cast the columns as in spark-2.4, or user should manually add cast to their query as per below example val schema = "col0 int,col1 int"; val df = spark.read.schema(schema).csv("file:/opt/fordebug/divTest.csv"); val res = df.selectExpr("col0 div col1") val res = df.selectExpr("Cast(col0 as Decimal) div col1 ") res.collect please let us know your opinion > Sql Div operator can result in incorrect output for int_min > ----------------------------------------------------------- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: Kuhu Shukla > Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org