This is an automated email from the ASF dual-hosted git repository.
chengpan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kyuubi.git
The following commit(s) were added to refs/heads/master by this push:
new 650ba5e32 [KYUUBI #5884] PVM should inherit MultiInstance and wrap
with new exprId
650ba5e32 is described below
commit 650ba5e323c852dac1dd9d31e007712cd63dd463
Author: Angerszhuuuu <[email protected]>
AuthorDate: Fri Dec 22 15:48:36 2023 +0800
[KYUUBI #5884] PVM should inherit MultiInstance and wrap with new exprId
# :mag: Description
## Issue References ๐
This pull request fixes #5884
## Describe Your Solution ๐ง
We meet a case that we create one temp view from a PVM, and setting
`spark.sql.legacy.storeAnalyzedPlanForView=true`
```
CREATE OR REPLACE TEMPORARY VIEW tmp_view AS
SELECT * FROM persist_view
```
Then we crease two new view based on `tmp_view` then join them, this cause
a column exprId conflict on join's left and right side.
This is because in spark side, to avoid view will cause a code exprId
conflict, it will wrap a project with new exprId to view's child
<img width="977" alt="ๆชๅฑ2023-12-20 20 45 59"
src="https://github.com/apache/kyuubi/assets/46485123/00bab655-a10c-4e61-b3d9-51c9208dec73">
Before PVM is UnaryNode, this behavior still can work on PVM's child, now
it's LeafNode then won't work.
Like HiveTableRelation, PVM also need to inherit `MultiInstanceRelation `
and wrap child with new ExprId to avid below issue.
This change works fine in our prod.
## Types of changes :bookmark:
- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
## Test Plan ๐งช
Added UT
#### Behavior Without This Pull Request :coffin:
Failed since vies's exprId conflict.
```
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve '`a.id`'
given input columns: [a.id, a.scope]; line 5 pos 3;
'Project [ArrayBuffer(a).*, 'b.scope AS new_scope#22]
+- 'Join Inner, ('a.id = 'b.id)
:- SubqueryAlias a
: +- SubqueryAlias view2
: +- Project [id#18, scope#19]
: +- Filter (scope#19 < 10)
: +- SubqueryAlias view1
: +- Project [id#18, scope#19]
: +- Filter (id#18 > 10)
: +- SubqueryAlias spark_catalog.default.perm_view
: +- PermanentViewMarker
: +- View (`default`.`perm_view`,
[id#18,scope#19])
: +- Project [id#20, scope#21]
: +- SubqueryAlias
spark_catalog.default.table1
: +- HiveTableRelation
[`default`.`table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data
Cols: [id#20, scope#21], Partition Cols: []]
+- SubqueryAlias b
+- SubqueryAlias view3
+- Project [id#18, scope#19]
+- Filter isnotnull(scope#19)
+- SubqueryAlias view1
+- Project [id#18, scope#19]
+- Filter (id#18 > 10)
+- SubqueryAlias spark_catalog.default.perm_view
+- PermanentViewMarker
+- View (`default`.`perm_view`,
[id#18,scope#19])
+- Project [id#20, scope#21]
+- SubqueryAlias
spark_catalog.default.table1
+- HiveTableRelation
[`default`.`table1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data
Cols: [id#20, scope#21], Partition Cols: []]
```
#### Behavior With This Pull Request :tada:
Can work well
```
Project [id#18, scope#19, scope#24 AS new_scope#22]
+- Join Inner, (id#18 = id#23)
:- Filter ((id#18 > 10) AND (scope#19 < 10))
: +- PermanentViewMarker
: +- View (`default`.`perm_view`, [id#18,scope#19])
: +- Project [id#20, scope#21]
: +- SubqueryAlias spark_catalog.default.table1
: +- HiveTableRelation [`default`.`table1`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#20,
scope#21], Partition Cols: []]
+- Filter ((id#23 > 10) AND isnotnull(scope#24))
+- PermanentViewMarker
+- Project [cast(id#18 as int) AS id#23, cast(scope#19 as int)
AS scope#24]
+- View (`default`.`perm_view`, [id#18,scope#19])
+- Project [id#20, scope#21]
+- SubqueryAlias spark_catalog.default.table1
+- HiveTableRelation [`default`.`table1`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#20,
scope#21], Partition Cols: []]
```
#### Related Unit Tests
---
# Checklists
## ๐ Author Self Checklist
- [x] My code follows the [style
guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html)
of this project
- [x] I have performed a self-review
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my feature
works
- [x] New and existing unit tests pass locally with my changes
- [x] This patch was not authored or co-authored using [Generative
Tooling](https://www.apache.org/legal/generative-tooling.html)
## ๐ Committer Pre-Merge Checklist
- [x] Pull request title is okay.
- [x] No license issues.
- [x] Milestone correctly set?
- [x] Test coverage is ok
- [x] Assignees are selected.
- [x] Minimum number of approvals
- [x] No changes are requested
**Be nice. Be informative.**
Closes #5885 from AngersZhuuuu/KYUUBI-5884.
Closes #5884
759afc140 [Cheng Pan] Update
extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RangerSparkExtensionSuite.scala
e005b6745 [Angerszhuuuu] Update RangerSparkExtensionSuite.scala
826e7a4db [Angerszhuuuu] Update RangerSparkExtensionSuite.scala
6814a833c [Angerszhuuuu] [KYUUBI #5884][Bug] PVM should inherit
MultiInstance and wrap with new exprId
Lead-authored-by: Angerszhuuuu <[email protected]>
Co-authored-by: Cheng Pan <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
---
.../rule/permanentview/PermanentViewMarker.scala | 15 ++++--
.../authz/ranger/RangerSparkExtensionSuite.scala | 58 ++++++++++++++++++++++
2 files changed, 70 insertions(+), 3 deletions(-)
diff --git
a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/rule/permanentview/PermanentViewMarker.scala
b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/rule/permanentview/PermanentViewMarker.scala
index b84599e81..c5cd1b7cb 100644
---
a/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/rule/permanentview/PermanentViewMarker.scala
+++
b/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/rule/permanentview/PermanentViewMarker.scala
@@ -17,16 +17,25 @@
package org.apache.kyuubi.plugin.spark.authz.rule.permanentview
+import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
import org.apache.spark.sql.catalyst.catalog.CatalogTable
-import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, Cast}
import org.apache.spark.sql.catalyst.plans.QueryPlan
-import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan}
+import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan,
Project}
-case class PermanentViewMarker(child: LogicalPlan, catalogTable: CatalogTable)
extends LeafNode {
+case class PermanentViewMarker(child: LogicalPlan, catalogTable: CatalogTable)
+ extends LeafNode with MultiInstanceRelation {
override def output: Seq[Attribute] = child.output
override def argString(maxFields: Int): String = ""
override def innerChildren: Seq[QueryPlan[_]] = child :: Nil
+
+ override def newInstance(): LogicalPlan = {
+ val projectList = child.output.map { case attr =>
+ Alias(Cast(attr, attr.dataType), attr.name)(explicitMetadata =
Some(attr.metadata))
+ }
+ this.copy(child = Project(projectList, child), catalogTable = catalogTable)
+ }
}
diff --git
a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RangerSparkExtensionSuite.scala
b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RangerSparkExtensionSuite.scala
index b2c23f4ef..9dd9613d8 100644
---
a/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RangerSparkExtensionSuite.scala
+++
b/extensions/spark/kyuubi-spark-authz/src/test/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RangerSparkExtensionSuite.scala
@@ -1414,4 +1414,62 @@ class HiveCatalogRangerSparkExtensionSuite extends
RangerSparkExtensionSuite {
}
}
}
+
+ test("[KYUUBI #5884] PVM should inherit MultiInstance and wrap with new
exprId") {
+ val db1 = defaultDb
+ val table1 = "table1"
+ val perm_view = "perm_view"
+ val view1 = "view1"
+ val view2 = "view2"
+ val view3 = "view3"
+ withSingleCallEnabled {
+ withCleanTmpResources(Seq.empty) {
+ sql("set spark.sql.legacy.storeAnalyzedPlanForView=true")
+ doAs(admin, sql(s"CREATE TABLE IF NOT EXISTS $db1.$table1(id int,
scope int)"))
+ doAs(admin, sql(s"CREATE VIEW $db1.$perm_view AS SELECT * FROM
$db1.$table1"))
+
+ doAs(
+ admin,
+ sql(
+ s"""
+ |CREATE OR REPLACE TEMPORARY VIEW $view1 AS
+ |SELECT *
+ |FROM $db1.$perm_view
+ |WHERE id > 10
+ |""".stripMargin))
+
+ doAs(
+ admin,
+ sql(
+ s"""
+ |CREATE OR REPLACE TEMPORARY VIEW $view2 AS
+ |SELECT *
+ |FROM $view1
+ |WHERE scope < 10
+ |""".stripMargin))
+
+ doAs(
+ admin,
+ sql(
+ s"""
+ |CREATE OR REPLACE TEMPORARY VIEW $view3 AS
+ |SELECT *
+ |FROM $view1
+ |WHERE scope is not null
+ |""".stripMargin))
+
+ interceptContains[AccessControlException](
+ doAs(
+ someone,
+ sql(
+ s"""
+ |SELECT a.*, b.scope as new_scope
+ |FROM $view2 a
+ |JOIN $view3 b
+ |ON a.id == b.id
+ |""".stripMargin).collect()))(s"does not have [select]
privilege on " +
+
s"[$db1/$perm_view/id,$db1/$perm_view/scope,$db1/$perm_view/scope,$db1/$perm_view/id]")
+ }
+ }
+ }
}