Никита Соколов created SPARK-46445:
--------------------------------------
Summary: V2ExpressionUtils.toCatalystTransformOpt throws when
resolving the bucket/sorted_bucket functions
Key: SPARK-46445
URL: https://issues.apache.org/jira/browse/SPARK-46445
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.5.0
Reporter: Никита Соколов
I am trying to build a V2 data-source exposing its partitioning+bucketing
properties so it would be possible to execute partition-wise joins. At the
moment it is impossible to let the Spark engine transform a
DataSourceV2Relation into a DataSourceV2ScanRelation because of this error:
{code:java}
org.apache.spark.sql.AnalysisException: [REQUIRES_SINGLE_PART_NAMESPACE]
spark_catalog requires a single-part namespace, but got . at
org.apache.spark.sql.errors.QueryCompilationErrors$.requiresSinglePartNamespaceError(QueryCompilationErrors.scala:1336)
at
org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog$TableIdentifierHelper.asFunctionIdentifier(V2SessionCatalog.scala:254)
at
org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadFunction(V2SessionCatalog.scala:351)
at
org.apache.spark.sql.catalyst.expressions.V2ExpressionUtils$.loadV2FunctionOpt(V2ExpressionUtils.scala:128)
at
org.apache.spark.sql.catalyst.expressions.V2ExpressionUtils$.$anonfun$toCatalystTransformOpt$6(V2ExpressionUtils.scala:114)
at scala.Option.flatMap(Option.scala:271) at
org.apache.spark.sql.catalyst.expressions.V2ExpressionUtils$.toCatalystTransformOpt(V2ExpressionUtils.scala:113)
at
org.apache.spark.sql.catalyst.expressions.V2ExpressionUtils$.toCatalystOpt(V2ExpressionUtils.scala:82)
at
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioningAndOrdering$$anonfun$partitioning$1.$anonfun$applyOrElse$1(V2ScanPartitioningAndOrdering.scala:47)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at
scala.collection.TraversableLike.map(TraversableLike.scala:286) at
scala.collection.TraversableLike.map$(TraversableLike.scala:279) at
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioningAndOrdering$$anonfun$partitioning$1.applyOrElse(V2ScanPartitioningAndOrdering.scala:47)
at
org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioningAndOrdering$$anonfun$partitioning$1.applyOrElse(V2ScanPartitioningAndOrdering.scala:42)
{code}
It looks like it is impossible for the library code to succeed:
here is the place where it tries to load the bucket function
–[https://github.com/apache/spark/blob/e359318c4493e16a7546d70c9340ffc5015aacff/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala#L108]
here is an Identifier with no namespace being constructed –
[https://github.com/apache/spark/blob/bfafad4d47b4f60e93d17ccc3a8dcc8bae03cf9a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala#L132]
here is the code trying to transform it to a FunctionIdentifier –
[https://github.com/apache/spark/blob/bfafad4d47b4f60e93d17ccc3a8dcc8bae03cf9a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala#L437]
but this is impossible for an Identifier with no namespace –
[https://github.com/apache/spark/blob/bfafad4d47b4f60e93d17ccc3a8dcc8bae03cf9a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala#L340]
I have managed to somehow bypass this by constructing the
DataSourceV2ScanRelation myself, but I guess this is not an intended use and
this forces me to push the filter-predicates down myself.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]