[spark] branch branch-3.0 updated: [SPARK-29947][SQL][FOLLOWUP] ResolveRelations should return relations with fresh attribute IDs

wenchen Wed, 03 Jun 2020 12:13:26 -0700

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 22d5d03  [SPARK-29947][SQL][FOLLOWUP] ResolveRelations should return 
relations with fresh attribute IDs
22d5d03 is described below

commit 22d5d0368b9084884071af69106d67e50e6cc07f
Author: Wenchen Fan <wenc...@databricks.com>
AuthorDate: Wed Jun 3 19:08:36 2020 +0000

    [SPARK-29947][SQL][FOLLOWUP] ResolveRelations should return relations with 
fresh attribute IDs
    
    ### What changes were proposed in this pull request?
    
    This is a followup of https://github.com/apache/spark/pull/26589, which 
caches the table relations to speed up the table lookup. However, it brings 
some side effects: the rule `ResolveRelations` may return exactly the same 
relations, while before it always returns relations with fresh attribute IDs.
    
    This PR is to eliminate this side effect.
    
    ### Why are the changes needed?
    
    There is no bug report yet, but this side effect may impact things like 
self-join. It's better to restore the 2.4 behavior and always return refresh 
relations.
    
    ### Does this PR introduce _any_ user-facing change?
    
    no
    
    ### How was this patch tested?
    
    N/A
    
    Closes #28717 from cloud-fan/fix.
    
    Authored-by: Wenchen Fan <wenc...@databricks.com>
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>
    (cherry picked from commit dc0709fa0ca75751d2de4ef95ad077f3e805a6ac)
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>
---
 .../scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala  | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 654cf42..6fb103e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -1007,7 +1007,7 @@ class Analyzer(
     private def lookupRelation(identifier: Seq[String]): Option[LogicalPlan] = 
{
       expandRelationName(identifier) match {
         case SessionCatalogAndIdentifier(catalog, ident) =>
-          def loaded = CatalogV2Util.loadTable(catalog, ident).map {
+          lazy val loaded = CatalogV2Util.loadTable(catalog, ident).map {
             case v1Table: V1Table =>
               v1SessionCatalog.getRelation(v1Table.v1Table)
             case table =>
@@ -1016,7 +1016,12 @@ class Analyzer(
                 DataSourceV2Relation.create(table, Some(catalog), Some(ident)))
           }
           val key = catalog.name +: ident.namespace :+ ident.name
-          Option(AnalysisContext.get.relationCache.getOrElseUpdate(key, 
loaded.orNull))
+          AnalysisContext.get.relationCache.get(key).map(_.transform {
+            case multi: MultiInstanceRelation => multi.newInstance()
+          }).orElse {
+            loaded.foreach(AnalysisContext.get.relationCache.update(key, _))
+            loaded
+          }
         case _ => None
       }
     }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-29947][SQL][FOLLOWUP] ResolveRelations should return relations with fresh attribute IDs

Reply via email to