[spark] branch branch-3.3 updated: [SPARK-40212][SQL] SparkSQL castPartValue does not properly handle byte, short, or float

gurwls223 Sun, 28 Aug 2022 18:57:55 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.3 by this push:
     new e3f6b6d1e15 [SPARK-40212][SQL] SparkSQL castPartValue does not 
properly handle byte, short, or float
e3f6b6d1e15 is described below

commit e3f6b6d1e15378860b5e30fb4c40168215b16eea
Author: Brennan Stein <[email protected]>
AuthorDate: Mon Aug 29 10:55:30 2022 +0900

    [SPARK-40212][SQL] SparkSQL castPartValue does not properly handle byte, 
short, or float
    
    The `castPartValueToDesiredType` function now returns byte for ByteType and 
short for ShortType, rather than ints; also floats for FloatType rather than 
double.
    
    Previously, attempting to read back in a file partitioned on one of these 
column types would result in a ClassCastException at runtime (for Byte, 
`java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Byte`). I can't think this is anything but a bug, as returning the 
correct data type prevents the crash.
    
    Yes: it changes the observed behavior when reading in a 
byte/short/float-partitioned file.
    
    Added unit test. Without the `castPartValueToDesiredType` updates, the test 
fails with the stated exception.
    
    ===
    I'll note that I'm not familiar enough with the spark repo to know if this 
will have ripple effects elsewhere, but tests pass on my fork and since the 
very similar https://github.com/apache/spark/pull/36344/files only needed to 
touch these two files I expect this change is self-contained as well.
    
    Closes #37659 from BrennanStein/spark40212.
    
    Authored-by: Brennan Stein <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
    (cherry picked from commit 146f187342140635b83bfe775b6c327755edfbe1)
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 .../sql/execution/datasources/PartitioningUtils.scala   |  7 +++++--
 .../parquet/ParquetPartitionDiscoverySuite.scala        | 17 +++++++++++++++++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
index e856bb5b9c2..2b9c6e724b6 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
@@ -530,9 +530,12 @@ object PartitioningUtils extends SQLConfHelper{
     case _ if value == DEFAULT_PARTITION_NAME => null
     case NullType => null
     case StringType => UTF8String.fromString(unescapePathName(value))
-    case ByteType | ShortType | IntegerType => Integer.parseInt(value)
+    case ByteType => Integer.parseInt(value).toByte
+    case ShortType => Integer.parseInt(value).toShort
+    case IntegerType => Integer.parseInt(value)
     case LongType => JLong.parseLong(value)
-    case FloatType | DoubleType => JDouble.parseDouble(value)
+    case FloatType => JDouble.parseDouble(value).toFloat
+    case DoubleType => JDouble.parseDouble(value)
     case _: DecimalType => Literal(new JBigDecimal(value)).value
     case DateType =>
       Cast(Literal(value), DateType, Some(zoneId.getId)).eval()
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
index bd908a36401..d87e0841dfe 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
@@ -1095,6 +1095,23 @@ abstract class ParquetPartitionDiscoverySuite
       checkAnswer(readback, Row(0, "AA") :: Row(1, "-0") :: Nil)
     }
   }
+
+  test("SPARK-40212: SparkSQL castPartValue does not properly handle byte, 
short, float") {
+    withTempDir { dir =>
+      val data = Seq[(Int, Byte, Short, Float)](
+        (1, 2, 3, 4.0f)
+      )
+      data.toDF("a", "b", "c", "d")
+        .write
+        .mode("overwrite")
+        .partitionBy("b", "c", "d")
+        .parquet(dir.getCanonicalPath)
+      val res = spark.read
+        .schema("a INT, b BYTE, c SHORT, d FLOAT")
+        .parquet(dir.getCanonicalPath)
+      checkAnswer(res, Seq(Row(1, 2, 3, 4.0f)))
+    }
+  }
 }
 
 class ParquetV1PartitionDiscoverySuite extends ParquetPartitionDiscoverySuite {


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.3 updated: [SPARK-40212][SQL] SparkSQL castPartValue does not properly handle byte, short, or float

Reply via email to