Никита Соколов created SPARK-46445:
--------------------------------------

             Summary: V2ExpressionUtils.toCatalystTransformOpt throws when 
resolving the bucket/sorted_bucket functions
                 Key: SPARK-46445
                 URL: https://issues.apache.org/jira/browse/SPARK-46445
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.5.0
            Reporter: Никита Соколов


I am trying to build a V2 data-source exposing its partitioning+bucketing 
properties so it would be possible to execute partition-wise joins. At the 
moment it is impossible to let the Spark engine transform a 
DataSourceV2Relation into a DataSourceV2ScanRelation because of this error:

 
{code:java}
org.apache.spark.sql.AnalysisException: [REQUIRES_SINGLE_PART_NAMESPACE] 
spark_catalog requires a single-part namespace, but got .      at 
org.apache.spark.sql.errors.QueryCompilationErrors$.requiresSinglePartNamespaceError(QueryCompilationErrors.scala:1336)
      at 
org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog$TableIdentifierHelper.asFunctionIdentifier(V2SessionCatalog.scala:254)
        at 
org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadFunction(V2SessionCatalog.scala:351)
      at 
org.apache.spark.sql.catalyst.expressions.V2ExpressionUtils$.loadV2FunctionOpt(V2ExpressionUtils.scala:128)
  at 
org.apache.spark.sql.catalyst.expressions.V2ExpressionUtils$.$anonfun$toCatalystTransformOpt$6(V2ExpressionUtils.scala:114)
  at scala.Option.flatMap(Option.scala:271)       at 
org.apache.spark.sql.catalyst.expressions.V2ExpressionUtils$.toCatalystTransformOpt(V2ExpressionUtils.scala:113)
     at 
org.apache.spark.sql.catalyst.expressions.V2ExpressionUtils$.toCatalystOpt(V2ExpressionUtils.scala:82)
       at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioningAndOrdering$$anonfun$partitioning$1.$anonfun$applyOrElse$1(V2ScanPartitioningAndOrdering.scala:47)
   at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)   at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)   at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)  at 
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)  at 
scala.collection.TraversableLike.map(TraversableLike.scala:286)      at 
scala.collection.TraversableLike.map$(TraversableLike.scala:279)     at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)      at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioningAndOrdering$$anonfun$partitioning$1.applyOrElse(V2ScanPartitioningAndOrdering.scala:47)
      at 
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioningAndOrdering$$anonfun$partitioning$1.applyOrElse(V2ScanPartitioningAndOrdering.scala:42)
 {code}
 

 
It looks like it is impossible for the library code to succeed:
here is the place where it tries to load the bucket function 
–[https://github.com/apache/spark/blob/e359318c4493e16a7546d70c9340ffc5015aacff/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala#L108]
here is an Identifier with no namespace being constructed – 
[https://github.com/apache/spark/blob/bfafad4d47b4f60e93d17ccc3a8dcc8bae03cf9a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala#L132]

here is the code trying to transform it to a FunctionIdentifier – 
[https://github.com/apache/spark/blob/bfafad4d47b4f60e93d17ccc3a8dcc8bae03cf9a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala#L437]
but this is impossible for an Identifier with no namespace – 
[https://github.com/apache/spark/blob/bfafad4d47b4f60e93d17ccc3a8dcc8bae03cf9a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala#L340]
 
I have managed to somehow bypass this by constructing the 
DataSourceV2ScanRelation myself, but I guess this is not an intended use and 
this forces me to push the filter-predicates down myself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to