[
https://issues.apache.org/jira/browse/FLINK-20416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271785#comment-17271785
]
Jark Wu commented on FLINK-20416:
---------------------------------
Thanks [~shared_ptr] for updating the design doc, I think the key point we need
to discuss is which level we should have cache?
- Approach#1: a cache on HiveCatalog
- Approach#2: a cache on HiveMetastoreClient?
I perfer the #2 because of the following reasons:
1) There are a lot of speical logic in the read and write methods of
HiveCatalog, it maybe error-prone if we skip the logics, e.g. we will add
{{is_generic}} into the table options if it is a non-hive table. If we use #1,
we will forward write option to the underlying HiveCatalog and update
CatalogTable in the cache, however the CatalogTable is not the truly one stored
in HiveCatalog.
2) Using #2 can provide more finer-grained caches, for example,
{{client.getPartition}} is used by multiple methods of HiveCatalog.
What do you think [~lirui]?
> Need a cached catalog for HiveCatalog
> -------------------------------------
>
> Key: FLINK-20416
> URL: https://issues.apache.org/jira/browse/FLINK-20416
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / Common, Connectors / Hive, Table SQL / API,
> Table SQL / Planner
> Reporter: Sebastian Liu
> Priority: Major
> Labels: pull-request-available
>
> For OLAP scenarios, There are usually some analytical queries which running
> time is relatively short. These queries are also sensitive to latency. In the
> current Blink sql processing, parse/validate/optimize stages are all need
> meta data from catalog API. But each request to the catalog requires re-run
> of the underlying meta query.
>
> We may need a cached catalog which can cache the table schema and statistic
> info to avoid unnecessary repeated meta requests.
> Design
> doc:[https://docs.google.com/document/d/1oL8HUpv2WaF6OkFvbH5iefXkOJB__Dal_bYsIZJA_Gk/edit?usp=sharing]
> I have submitted a related PR for adding a genetic cached catalog, which can
> delegate other implementations of {{AbstractCatalog. }}
> {{[https://github.com/apache/flink/pull/14260]}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)