This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push:
new 286c469ad13 [SPARK-46443][SQL] Decimal precision and scale should
decided by H2 dialect
286c469ad13 is described below
commit 286c469ad1305f91ea796fd453ae896617fb3883
Author: Jiaan Geng <[email protected]>
AuthorDate: Fri Dec 22 09:55:00 2023 +0800
[SPARK-46443][SQL] Decimal precision and scale should decided by H2 dialect
### What changes were proposed in this pull request?
This PR fix a but by make JDBC dialect decide the decimal precision and
scale.
**How to reproduce the bug?**
https://github.com/apache/spark/pull/44397 proposed DS V2 push down
`PERCENTILE_CONT` and `PERCENTILE_DISC`.
The bug fired when pushdown the below SQL to H2 JDBC.
`SELECT "DEPT",PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "SALARY" ASC
NULLS FIRST) FROM "test"."employee" WHERE 1=0 GROUP BY "DEPT"`
**The root cause**
`getQueryOutputSchema` used to get the output schema of query by call
`JdbcUtils.getSchema`.
The query for database H2 show below.
`SELECT "DEPT",PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "SALARY" ASC
NULLS FIRST) FROM "test"."employee" WHERE 1=0 GROUP BY "DEPT"`
We can get the five variables from `ResultSetMetaData`, please refer:
```
columnName = "PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY SALARY NULLS
FIRST)"
dataType = 2
typeName = "NUMERIC"
fieldSize = 100000
fieldScale = 50000
```
Then we get the catalyst schema with `JdbcUtils.getCatalystType`, it calls
`DecimalType.bounded(precision, scale)` actually.
The `DecimalType.bounded(100000, 50000)` returns `DecimalType(38, 38)`.
At finally, `makeGetter` throws exception.
```
Caused by: org.apache.spark.SparkArithmeticException:
[DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION] Decimal precision 42 exceeds max
precision 38. SQLSTATE: 22003
at
org.apache.spark.sql.errors.DataTypeErrors$.decimalPrecisionExceedsMaxPrecisionError(DataTypeErrors.scala:48)
at org.apache.spark.sql.types.Decimal.set(Decimal.scala:124)
at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:577)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$4(JdbcUtils.scala:408)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.nullSafeConvert(JdbcUtils.scala:552)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$3(JdbcUtils.scala:408)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$3$adapted(JdbcUtils.scala:406)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:358)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:339)
```
### Why are the changes needed?
This PR fix the bug that `JdbcUtils` can't get the correct decimal type.
### Does this PR introduce _any_ user-facing change?
'Yes'.
Fix a bug.
### How was this patch tested?
Manual tests in https://github.com/apache/spark/pull/44397
### Was this patch authored or co-authored using generative AI tooling?
'No'.
Closes #44398 from beliefer/SPARK-46443.
Authored-by: Jiaan Geng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit a921da8509a19b2d23c30ad657725f760932236c)
Signed-off-by: Wenchen Fan <[email protected]>
---
.../main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
index 8471a49153f..3f56eb035f5 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
@@ -35,7 +35,7 @@ import
org.apache.spark.sql.connector.catalog.functions.UnboundFunction
import org.apache.spark.sql.connector.catalog.index.TableIndex
import org.apache.spark.sql.connector.expressions.{Expression, FieldReference,
NamedReference}
import org.apache.spark.sql.execution.datasources.jdbc.{JDBCOptions, JdbcUtils}
-import org.apache.spark.sql.types.{BooleanType, ByteType, DataType,
DecimalType, ShortType, StringType}
+import org.apache.spark.sql.types.{BooleanType, ByteType, DataType,
DecimalType, MetadataBuilder, ShortType, StringType}
private[sql] object H2Dialect extends JdbcDialect {
override def canHandle(url: String): Boolean =
@@ -57,6 +57,20 @@ private[sql] object H2Dialect extends JdbcDialect {
override def isSupportedFunction(funcName: String): Boolean =
supportedFunctions.contains(funcName)
+ override def getCatalystType(
+ sqlType: Int, typeName: String, size: Int, md: MetadataBuilder):
Option[DataType] = {
+ sqlType match {
+ case Types.NUMERIC if size > 38 =>
+ // H2 supports very large decimal precision like 100000. The max
precision in Spark is only
+ // 38. Here we shrink both the precision and scale of H2 decimal to
fit Spark, and still
+ // keep the ratio between them.
+ val scale = if (null != md) md.build().getLong("scale") else 0L
+ val selectedScale = (DecimalType.MAX_PRECISION * (scale.toDouble /
size.toDouble)).toInt
+ Option(DecimalType(DecimalType.MAX_PRECISION, selectedScale))
+ case _ => None
+ }
+ }
+
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
case StringType => Option(JdbcType("CLOB", Types.CLOB))
case BooleanType => Some(JdbcType("BOOLEAN", Types.BOOLEAN))
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]