This is an automated email from the ASF dual-hosted git repository. chengpan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/kyuubi.git
The following commit(s) were added to refs/heads/master by this push: new bcaff5a3f1 [KYUUBI #7077] Spark 3.5: Enhance MaxScanStrategy for DSv2 bcaff5a3f1 is described below commit bcaff5a3f1232945cf7d029b0db13feca51b3d9b Author: zhaohehuhu <luoyedeyi...@163.com> AuthorDate: Thu May 29 13:25:55 2025 +0800 [KYUUBI #7077] Spark 3.5: Enhance MaxScanStrategy for DSv2 ### Why are the changes needed? To enhance the MaxScanStrategy in Spark's DSv2 to ensure it only works for relations that support statistics reporting. This prevents Spark from returning a default value of Long.MaxValue, which, leads to some queries failing or behaving unexpectedly. ### How was this patch tested? It tested out locally. ### Was this patch authored or co-authored using generative AI tooling? No Closes #7077 from zhaohehuhu/dev-0527. Closes #7077 64001c94e [zhaohehuhu] fix MaxScanStrategy for datasource v2 Authored-by: zhaohehuhu <luoyedeyi...@163.com> Signed-off-by: Cheng Pan <cheng...@apache.org> --- .../main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala b/extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala index e647ad3250..e8144f25ae 100644 --- a/extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala +++ b/extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala @@ -23,6 +23,7 @@ import org.apache.spark.sql.catalyst.SQLConfHelper import org.apache.spark.sql.catalyst.catalog.{CatalogTable, HiveTableRelation} import org.apache.spark.sql.catalyst.planning.ScanOperation import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.connector.read.SupportsReportStatistics import org.apache.spark.sql.execution.SparkPlan import org.apache.spark.sql.execution.datasources.{CatalogFileIndex, HadoopFsRelation, InMemoryFileIndex, LogicalRelation} import org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanRelation @@ -237,7 +238,7 @@ case class MaxScanStrategy(session: SparkSession) _, _, _, - relation @ DataSourceV2ScanRelation(_, _, _, _, _)) => + relation @ DataSourceV2ScanRelation(_, _: SupportsReportStatistics, _, _, _)) => val table = relation.relation.table if (table.partitioning().nonEmpty) { val partitionColumnNames = table.partitioning().map(_.describe())