[jira] [Commented] (SPARK-37690) Recursive view `df` detected (cycle: `df` -> `df`)

Robin (Jira) Thu, 23 Dec 2021 00:17:16 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-37690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464354#comment-17464354
 ]


Robin commented on SPARK-37690:
-------------------------------

Someone 
[here|https://community.databricks.com/s/question/0D53f00001Qugr7CAB/upgrading-from-spark-24-to-32-recursive-view-errors-when-using]
 has suggested this is an intentional breaking change introduced in Spark 3.1:

>From [Migration Guide: SQL, Datasets and DataFrame - Spark 3.1.1 Documentation 
>(apache.org)|https://spark.apache.org/docs/3.1.1/sql-migration-guide.html]]

> In Spark 3.1, the temporary view will have same behaviors with the permanent 
> view, i.e. capture and store runtime SQL configs, SQL text, catalog and 
> namespace. The capatured view properties will be applied during the parsing 
> and analysis phases of the view resolution. To restore the behavior before 
> Spark 3.1, {*}you can set spark.sql.legacy.storeAnalyzedPlanForView to 
> true{*}.

 

Grateful if someone could clarify.  Worth noting that the example code works in 
Spark 3.1.2, just not 3.2.0.  It's not obvious to me the above quote implies 
`createOrReplaceTempView` would fail in the example code posted in the issue.

> Recursive view `df` detected (cycle: `df` -> `df`)
> --------------------------------------------------
>
>                 Key: SPARK-37690
>                 URL: https://issues.apache.org/jira/browse/SPARK-37690
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.2.0
>            Reporter: Robin
>            Priority: Major
>
> In Spark 3.2.0, you can no longer reuse the same name for a temporary view.  
> This change is backwards incompatible, and means a common way of running 
> pipelines of SQL queries no longer works.   The following is a simple 
> reproducible example that works in Spark 2.x and 3.1.2, but not in 3.2.0: 
> {code:python}from pyspark.context import SparkContext 
> from pyspark.sql import SparkSession 
> sc = SparkContext.getOrCreate() 
> spark = SparkSession(sc) 
> sql = """ SELECT id as col_1, rand() AS col_2 FROM RANGE(10); """ 
> df = spark.sql(sql) 
> df.createOrReplaceTempView("df") 
> sql = """ SELECT * FROM df """ 
> df = spark.sql(sql) 
> df.createOrReplaceTempView("df") 
> sql = """ SELECT * FROM df """ 
> df = spark.sql(sql) {code}   
> The following error is now produced:   
> {code:python}AnalysisException: Recursive view `df` detected (cycle: `df` -> 
> `df`) 
> {code} 
> I'm reasonably sure this change is unintentional in 3.2.0 since it breaks a 
> lot of legacy code, and the `createOrReplaceTempView` method is named 
> explicitly such that replacing an existing view should be allowed.   An 
> internet search suggests other users have run into a similar problems, e.g. 
> [here|https://community.databricks.com/s/question/0D53f00001Qugr7CAB/upgrading-from-spark-24-to-32-recursive-view-errors-when-using]
>   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37690) Recursive view `df` detected (cycle: `df` -> `df`)

Reply via email to