(spark) branch branch-4.1 updated: [SPARK-54206][CONNECT][FOLLOWUP] Use VARBINARY type and reasonable max length for BinaryType

dongjoon Fri, 28 Nov 2025 14:02:44 -0800

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-4.1
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.1 by this push:
     new e00cd4e5ca8e [SPARK-54206][CONNECT][FOLLOWUP] Use VARBINARY type and 
reasonable max length for BinaryType
e00cd4e5ca8e is described below

commit e00cd4e5ca8e8245a0d73b5baf3f21f926b0107e
Author: vinodkc <[email protected]>
AuthorDate: Fri Nov 28 14:02:19 2025 -0800

    [SPARK-54206][CONNECT][FOLLOWUP] Use VARBINARY type and reasonable max 
length for BinaryType
    
    ### What changes were proposed in this pull request?
    
    This PR improves the JDBC type mapping for BinaryType in the Spark Connect 
JDBC client
    
    ### Why are the changes needed?
    
    - **Semantic correctness**: Types.VARBINARY (variable-length) better 
matches Spark's BinaryType semantics.
    
    - **Industry alignment**:
    SQL Server dialect already uses VARBINARY(MAX) for BinaryType .
    Trino JDBC driver uses VARBINARY with a maximum of 1 GB.
    MariaDB JDBC driver uses VARBINARY/LONGVARBINARY for blob types
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, but minimal impact.
    Both BINARY and VARBINARY map to byte array types
    The precision change is within reasonable bounds
    
    ### How was this patch tested?
    
    Existing tests: All tests in `SparkConnectJdbcDataTypeSuite` pass.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #53252 from vinodkc/br_SPARK-54206_followup_fix.
    
    Authored-by: vinodkc <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit 87a8b5629d358f365973679555acfe502b1651ac)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../apache/spark/sql/connect/client/jdbc/util/JdbcTypeUtils.scala | 4 ++--
 .../sql/connect/client/jdbc/SparkConnectJdbcDataTypeSuite.scala   | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git 
a/sql/connect/client/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/util/JdbcTypeUtils.scala
 
b/sql/connect/client/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/util/JdbcTypeUtils.scala
index a3adf2b180d8..458f94c51f89 100644
--- 
a/sql/connect/client/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/util/JdbcTypeUtils.scala
+++ 
b/sql/connect/client/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/util/JdbcTypeUtils.scala
@@ -39,7 +39,7 @@ private[jdbc] object JdbcTypeUtils {
     case DateType => Types.DATE
     case TimestampType => Types.TIMESTAMP
     case TimestampNTZType => Types.TIMESTAMP
-    case BinaryType => Types.BINARY
+    case BinaryType => Types.VARBINARY
     case _: TimeType => Types.TIME
     case other =>
       throw new SQLFeatureNotSupportedException(s"DataType $other is not 
supported yet.")
@@ -83,7 +83,7 @@ private[jdbc] object JdbcTypeUtils {
     case LongType => 19
     case FloatType => 7
     case DoubleType => 15
-    case StringType => 255
+    case StringType => Int.MaxValue
     case DecimalType.Fixed(p, _) => p
     case DateType => 10
     case TimestampType => 29
diff --git 
a/sql/connect/client/jdbc/src/test/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectJdbcDataTypeSuite.scala
 
b/sql/connect/client/jdbc/src/test/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectJdbcDataTypeSuite.scala
index 3a02f78c4383..eb3afcc1bcf2 100644
--- 
a/sql/connect/client/jdbc/src/test/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectJdbcDataTypeSuite.scala
+++ 
b/sql/connect/client/jdbc/src/test/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectJdbcDataTypeSuite.scala
@@ -216,9 +216,9 @@ class SparkConnectJdbcDataTypeSuite extends ConnectFunSuite 
with RemoteSparkSess
       assert(metaData.getColumnTypeName(1) === "STRING")
       assert(metaData.getColumnClassName(1) === "java.lang.String")
       assert(metaData.isSigned(1) === false)
-      assert(metaData.getPrecision(1) === 255)
+      assert(metaData.getPrecision(1) === Int.MaxValue)
       assert(metaData.getScale(1) === 0)
-      assert(metaData.getColumnDisplaySize(1) === 255)
+      assert(metaData.getColumnDisplaySize(1) === Int.MaxValue)
     }
   }
 
@@ -389,7 +389,7 @@ class SparkConnectJdbcDataTypeSuite extends ConnectFunSuite 
with RemoteSparkSess
 
       val metaData = rs.getMetaData
       assert(metaData.getColumnCount === 1)
-      assert(metaData.getColumnType(1) === Types.BINARY)
+      assert(metaData.getColumnType(1) === Types.VARBINARY)
       assert(metaData.getColumnTypeName(1) === "BINARY")
       assert(metaData.getColumnClassName(1) === "[B")
       assert(metaData.isSigned(1) === false)
@@ -405,7 +405,7 @@ class SparkConnectJdbcDataTypeSuite extends ConnectFunSuite 
with RemoteSparkSess
 
       val metaData = rs.getMetaData
       assert(metaData.getColumnCount === 1)
-      assert(metaData.getColumnType(1) === Types.BINARY)
+      assert(metaData.getColumnType(1) === Types.VARBINARY)
       assert(metaData.getColumnTypeName(1) === "BINARY")
       assert(metaData.getColumnClassName(1) === "[B")
     }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-4.1 updated: [SPARK-54206][CONNECT][FOLLOWUP] Use VARBINARY type and reasonable max length for BinaryType

Reply via email to