[
https://issues.apache.org/jira/browse/SPARK-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-17030:
---------------------------------
Labels: bulk-closed (was: )
> Remove/Cleanup HiveMetastoreCatalog.scala
> -----------------------------------------
>
> Key: SPARK-17030
> URL: https://issues.apache.org/jira/browse/SPARK-17030
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Xiao Li
> Priority: Major
> Labels: bulk-closed
>
> Metadata cache is a key-value cache built on Google Guava Cache to speed up
> building logical plan nodes (`LogicalRelation`) for data source tables. The
> cache key is a unique identifier of a table. Here, the identifier is the
> fully qualified table name, including the database in which it resides. (In
> the future, it could be extended to a multi-part names when introducing
> federated Catalog). The value is the corresponding LogicalRelation that
> represents a specific data source table.
> The cache is session based. In each session, the cache is managed in two
> different ways at the same time.
> 1. **Auto loading**: when Spark querying the cache for a user-defined data
> source table, the cache either returns a cached LogicalRelation, or else
> automatically builds a new one by decoding the metadata fetched from the
> external Catalog.
> 2. **Manual caching**: Hive tables are represented as logical plan nodes
> MetastoreRelation. For better performance, we convert Hive serde tables to
> data source tables, if convertible. The conversion is not completed at the
> stage of metadata loading. Instead, it is conducted during semantic analysis.
> If a Hive serde table is convertible, we first try to get the value (by the
> fully qualified table name) from the metadata cache. If existed, we use it
> directly; otherwise, build a new one and also push it into the cache for the
> future reuse.
> Currently, the file `HiveMetastoreCatalog.scala` contains different
> entities/functions since all of them require interaction with the cache,
> called `cachedDataSourceTables`. This JIRA is to cleanup
> `HiveMetastoreCatalog.scala`.
> **Proposal**: To avoid mixing everything related to cache in the same file,
> we abstract and define the following API for cache operations. After the code
> changes, `HiveMetastoreatalog.scala` only contains the cache API
> implementation. The file name can be renamed to `MetadataCache.scala`
> {noformat}
> // cacheTable is a wrapper of cache.put(key, value). It associates value with
> key in this cache.
> // If the cache previously contained a value associated with key, the old
> value is replaced by value.
> def cacheTable(tableIdent: TableIdentifier, plan: LogicalPlan): Unit
> {noformat}
> {noformat}
> // getTableIfPresent is a wrapper of cache.getIfPresent(key) that never
> causes values to be automatically loaded.
> def getTableIfPresent(tableIdent: TableIdentifier): Option[LogicalPlan]
> {noformat}
> {noformat}
> // getTable is a wrapper of cache.get(key). If cache misses, Caches loaded by
> a CacheLoader will call
> // CacheLoader.load(K) to load new values into the cache. That means, it will
> call the function load.
> def getTable(tableIdent: TableIdentifier): LogicalPlan
> {noformat}
> {noformat}
> // refreshTable is a wrapper of cache.invalidate. It does not eagerly reload
> the cache.
> // It just invalidate the cache. Next time when we use the table, it will be
> populated in the cache.
> def refreshTable(tableIdent: TableIdentifier): Unit
> {noformat}
> {noformat}
> // Discards all entries in the cache. It is a wrapper of cache.invalidateAll.
> def invalidateAll(): Unit
> {noformat}
> We should also move three Hive-specific Analyzer rules `CreateTables`,
> `OrcConversions` and `ParquetConversions` from `HiveMetastoreCatalog.scala`
> to `HiveStrategies.scala`.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]