This is an automated email from the ASF dual-hosted git repository.

chengpan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kyuubi.git


The following commit(s) were added to refs/heads/master by this push:
     new bcaff5a3f1 [KYUUBI #7077] Spark 3.5: Enhance MaxScanStrategy for DSv2
bcaff5a3f1 is described below

commit bcaff5a3f1232945cf7d029b0db13feca51b3d9b
Author: zhaohehuhu <luoyedeyi...@163.com>
AuthorDate: Thu May 29 13:25:55 2025 +0800

    [KYUUBI #7077] Spark 3.5: Enhance MaxScanStrategy for DSv2
    
    ### Why are the changes needed?
    
    To enhance the MaxScanStrategy in Spark's DSv2 to ensure it only works for 
relations that support statistics reporting. This prevents Spark from returning 
a default value of Long.MaxValue, which, leads to some queries failing or 
behaving unexpectedly.
    ### How was this patch tested?
    
    It tested out locally.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #7077 from zhaohehuhu/dev-0527.
    
    Closes #7077
    
    64001c94e [zhaohehuhu] fix MaxScanStrategy for datasource v2
    
    Authored-by: zhaohehuhu <luoyedeyi...@163.com>
    Signed-off-by: Cheng Pan <cheng...@apache.org>
---
 .../main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala    | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala
 
b/extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala
index e647ad3250..e8144f25ae 100644
--- 
a/extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala
+++ 
b/extensions/spark/kyuubi-extension-spark-3-5/src/main/scala/org/apache/kyuubi/sql/watchdog/MaxScanStrategy.scala
@@ -23,6 +23,7 @@ import org.apache.spark.sql.catalyst.SQLConfHelper
 import org.apache.spark.sql.catalyst.catalog.{CatalogTable, HiveTableRelation}
 import org.apache.spark.sql.catalyst.planning.ScanOperation
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.connector.read.SupportsReportStatistics
 import org.apache.spark.sql.execution.SparkPlan
 import org.apache.spark.sql.execution.datasources.{CatalogFileIndex, 
HadoopFsRelation, InMemoryFileIndex, LogicalRelation}
 import org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanRelation
@@ -237,7 +238,7 @@ case class MaxScanStrategy(session: SparkSession)
             _,
             _,
             _,
-            relation @ DataSourceV2ScanRelation(_, _, _, _, _)) =>
+            relation @ DataSourceV2ScanRelation(_, _: 
SupportsReportStatistics, _, _, _)) =>
         val table = relation.relation.table
         if (table.partitioning().nonEmpty) {
           val partitionColumnNames = table.partitioning().map(_.describe())

Reply via email to