[spark] branch master updated: [SPARK-34266][SQL][DOCS] Update comments for `SessionCatalog.refreshTable()` and `CatalogImpl.refreshTable()`

wenchen Mon, 01 Feb 2021 05:07:57 -0800

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 9530275  [SPARK-34266][SQL][DOCS] Update comments for 
`SessionCatalog.refreshTable()` and `CatalogImpl.refreshTable()`
9530275 is described below

commit 95302756f11a441f2b1a08819887ec8078e1deb3
Author: Max Gekk <[email protected]>
AuthorDate: Mon Feb 1 13:07:05 2021 +0000

    [SPARK-34266][SQL][DOCS] Update comments for 
`SessionCatalog.refreshTable()` and `CatalogImpl.refreshTable()`
    
    ### What changes were proposed in this pull request?
    Describe `SessionCatalog.refreshTable()` and `CatalogImpl.refreshTable()`. 
what they do and when they are supposed to be used.
    
    ### Why are the changes needed?
    To improve code maintenance.
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    By running `./dev/scalastyle`
    
    Closes #31364 from MaxGekk/doc-refreshTable.
    
    Authored-by: Max Gekk <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
---
 .../spark/sql/catalyst/catalog/SessionCatalog.scala | 21 ++++++++++++++++++++-
 .../org/apache/spark/sql/internal/CatalogImpl.scala | 19 +++++++++++++------
 2 files changed, 33 insertions(+), 7 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index 098ee9f..9e4da36 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -1021,7 +1021,26 @@ class SessionCatalog(
   }
 
   /**
-   * Refresh the cache entry for a metastore table, if any.
+   * Refresh table entries in structures maintained by the session catalog 
such as:
+   *   - The map of temporary or global temporary view names to their logical 
plans
+   *   - The relation cache which maps table identifiers to their logical plans
+   *
+   * For temp views, it refreshes their logical plans, and as a consequence of 
that it can refresh
+   * the file indexes of the base relations (`HadoopFsRelation` for instance) 
used in the views.
+   * The method still keeps the views in the internal lists of session catalog.
+   *
+   * For tables/views, it removes their entries from the relation cache.
+   *
+   * The method is supposed to use in the following situations:
+   *   1. The logical plan of a table/view was changed, and cached table/view 
data is cleared
+   *      explicitly. For example, like in `AlterTableRenameCommand` which 
re-caches the table
+   *      itself. Otherwise if you need to refresh cached data, consider using 
of
+   *      `CatalogImpl.refreshTable()`.
+   *   2. A table/view doesn't exist, and need to only remove its entry in the 
relation cache since
+   *      the cached data is invalidated explicitly like in `DropTableCommand` 
which uncaches
+   *      table/view data itself.
+   *   3. Meta-data (such as file indexes) of any relation used in a temporary 
view should be
+   *      updated.
    */
   def refreshTable(name: TableIdentifier): Unit = synchronized {
     lookupTempView(name).map(_.refresh).getOrElse {
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
index e5f02d8..d67067c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
@@ -515,14 +515,21 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
   }
 
   /**
-   * Invalidates and refreshes all the cached data and metadata of the given 
table or view.
-   * For Hive metastore table, the metadata is refreshed. For data source 
tables, the schema will
-   * not be inferred and refreshed.
+   * The method fully refreshes a table or view with the given name including:
+   *   1. The relation cache in the session catalog. The method removes table 
entry from the cache.
+   *   2. The file indexes of all relations used by the given view.
+   *   3. Table/View schema in the Hive Metastore if the SQL config
+   *      `spark.sql.hive.caseSensitiveInferenceMode` is set to 
`INFER_AND_SAVE`.
+   *   4. Cached data of the given table or view, and all its dependents that 
refer to it.
+   *      Existing cached data will be cleared and the cache will be lazily 
filled when
+   *      the next time the table/view or the dependents are accessed.
    *
-   * If this table is cached as an InMemoryRelation, re-cache the table and 
its dependents lazily.
+   * The method does not do:
+   *   - schema inference for file source tables
+   *   - statistics update
    *
-   * In addition, refreshing a table also clear all caches that have reference 
to the table
-   * in a cascading manner. This is to prevent incorrect result from the 
otherwise staled caches.
+   * The method is supposed to be used in all cases when need to refresh 
table/view data
+   * and meta-data.
    *
    * @group cachemgmt
    * @since 2.0.0


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [SPARK-34266][SQL][DOCS] Update comments for `SessionCatalog.refreshTable()` and `CatalogImpl.refreshTable()`

Reply via email to