This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 9530275 [SPARK-34266][SQL][DOCS] Update comments for
`SessionCatalog.refreshTable()` and `CatalogImpl.refreshTable()`
9530275 is described below
commit 95302756f11a441f2b1a08819887ec8078e1deb3
Author: Max Gekk <[email protected]>
AuthorDate: Mon Feb 1 13:07:05 2021 +0000
[SPARK-34266][SQL][DOCS] Update comments for
`SessionCatalog.refreshTable()` and `CatalogImpl.refreshTable()`
### What changes were proposed in this pull request?
Describe `SessionCatalog.refreshTable()` and `CatalogImpl.refreshTable()`.
what they do and when they are supposed to be used.
### Why are the changes needed?
To improve code maintenance.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By running `./dev/scalastyle`
Closes #31364 from MaxGekk/doc-refreshTable.
Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
---
.../spark/sql/catalyst/catalog/SessionCatalog.scala | 21 ++++++++++++++++++++-
.../org/apache/spark/sql/internal/CatalogImpl.scala | 19 +++++++++++++------
2 files changed, 33 insertions(+), 7 deletions(-)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index 098ee9f..9e4da36 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -1021,7 +1021,26 @@ class SessionCatalog(
}
/**
- * Refresh the cache entry for a metastore table, if any.
+ * Refresh table entries in structures maintained by the session catalog
such as:
+ * - The map of temporary or global temporary view names to their logical
plans
+ * - The relation cache which maps table identifiers to their logical plans
+ *
+ * For temp views, it refreshes their logical plans, and as a consequence of
that it can refresh
+ * the file indexes of the base relations (`HadoopFsRelation` for instance)
used in the views.
+ * The method still keeps the views in the internal lists of session catalog.
+ *
+ * For tables/views, it removes their entries from the relation cache.
+ *
+ * The method is supposed to use in the following situations:
+ * 1. The logical plan of a table/view was changed, and cached table/view
data is cleared
+ * explicitly. For example, like in `AlterTableRenameCommand` which
re-caches the table
+ * itself. Otherwise if you need to refresh cached data, consider using
of
+ * `CatalogImpl.refreshTable()`.
+ * 2. A table/view doesn't exist, and need to only remove its entry in the
relation cache since
+ * the cached data is invalidated explicitly like in `DropTableCommand`
which uncaches
+ * table/view data itself.
+ * 3. Meta-data (such as file indexes) of any relation used in a temporary
view should be
+ * updated.
*/
def refreshTable(name: TableIdentifier): Unit = synchronized {
lookupTempView(name).map(_.refresh).getOrElse {
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
b/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
index e5f02d8..d67067c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
@@ -515,14 +515,21 @@ class CatalogImpl(sparkSession: SparkSession) extends
Catalog {
}
/**
- * Invalidates and refreshes all the cached data and metadata of the given
table or view.
- * For Hive metastore table, the metadata is refreshed. For data source
tables, the schema will
- * not be inferred and refreshed.
+ * The method fully refreshes a table or view with the given name including:
+ * 1. The relation cache in the session catalog. The method removes table
entry from the cache.
+ * 2. The file indexes of all relations used by the given view.
+ * 3. Table/View schema in the Hive Metastore if the SQL config
+ * `spark.sql.hive.caseSensitiveInferenceMode` is set to
`INFER_AND_SAVE`.
+ * 4. Cached data of the given table or view, and all its dependents that
refer to it.
+ * Existing cached data will be cleared and the cache will be lazily
filled when
+ * the next time the table/view or the dependents are accessed.
*
- * If this table is cached as an InMemoryRelation, re-cache the table and
its dependents lazily.
+ * The method does not do:
+ * - schema inference for file source tables
+ * - statistics update
*
- * In addition, refreshing a table also clear all caches that have reference
to the table
- * in a cascading manner. This is to prevent incorrect result from the
otherwise staled caches.
+ * The method is supposed to be used in all cases when need to refresh
table/view data
+ * and meta-data.
*
* @group cachemgmt
* @since 2.0.0
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]