[ 
https://issues.apache.org/jira/browse/SPARK-37932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478542#comment-17478542
 ] 

Feng Zhu commented on SPARK-37932:
----------------------------------

test 

{code:scala}

test("SPARK-37932: view join view self with having filter") {
withTable("t") {
withView("v1") {
Seq((2, "test2"), (3, "test3"), (1, "test1")).toDF("id", "name")
.write.format("parquet").saveAsTable("t")
sql("CREATE VIEW v1 (id, name) AS SELECT id, name FROM t")

sql("""
|SELECT l1.id
| FROM v1 l1
| INNER JOIN (
| SELECT id
| FROM v1
| GROUP BY id
| HAVING COUNT(DISTINCT name) > 1
| ) l2
| ON l1.id = l2.id
| GROUP BY l1.name, l1.id;
""".stripMargin)

}
}
}

{code}

 

exception

{code}

org.apache.spark.sql.AnalysisException: Resolved attribute(s) name#25 missing 
from id#29,name#30 in operator !Aggregate [id#29], [id#29, count(distinct 
name#25) AS count(distinct name#25)#31L]. Attribute(s) with the same name 
appear in the operation: name. Please check if the right attribute(s) are used.;
Aggregate [name#25, id#24], [id#24]
+- Join Inner, (id#24 = id#29)
   :- SubqueryAlias l1
   :  +- SubqueryAlias spark_catalog.default.v1
   :     +- View (`default`.`v1`, [id#24,name#25])
   :        +- Project [cast(id#20 as int) AS id#24, cast(name#21 as string) AS 
name#25]
   :           +- Project [id#20, name#21]
   :              +- SubqueryAlias spark_catalog.default.t
   :                 +- Relation default.t[id#20,name#21] parquet
   +- SubqueryAlias l2
      +- Project [id#29]
         +- Filter (count(distinct name#25)#31L > cast(1 as bigint))
            +- !Aggregate [id#29], [id#29, count(distinct name#25) AS 
count(distinct name#25)#31L]
               +- SubqueryAlias spark_catalog.default.v1
                  +- View (`default`.`v1`, [id#29,name#30])
                     +- Project [cast(id#26 as int) AS id#29, cast(name#27 as 
string) AS name#30]
                        +- Project [id#26, name#27]
                           +- SubqueryAlias spark_catalog.default.t
                              +- Relation default.t[id#26,name#27] parquet

    at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:51)
    at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:50)

{code}

 

> Analyzer can fail when join left side and right side are the same view
> ----------------------------------------------------------------------
>
>                 Key: SPARK-37932
>                 URL: https://issues.apache.org/jira/browse/SPARK-37932
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Feng Zhu
>            Priority: Major
>         Attachments: sql_and_exception
>
>
> See the attachment for details, including SQL and the exception information.
>  * sql1, there is a normal filter (LO_SUPPKEY > 10) in the right side 
> subquery, Analyzer works as expected;
>  * sql2, there is a HAVING filter(HAVING COUNT(DISTINCT LO_SUPPKEY) > 1) in 
> the right side subquery, Analyzer failed with "Resolved attribute(s) 
> LO_SUPPKEY#337 missing ...".
>       From the debug info, the problem seems to be occurred after the rule 
> DeduplicateRelations is applied.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to