[
https://issues.apache.org/jira/browse/IMPALA-7533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vihang Karajgaonkar updated IMPALA-7533:
----------------------------------------
Labels: catalog-v2 (was: )
> Optimize fetch-from-catalog by caching partitions across table versions
> -----------------------------------------------------------------------
>
> Key: IMPALA-7533
> URL: https://issues.apache.org/jira/browse/IMPALA-7533
> Project: IMPALA
> Issue Type: Sub-task
> Reporter: Todd Lipcon
> Priority: Major
> Labels: catalog-v2
>
> Currently, the cached partition-level information in CatalogdMetaProvider is
> tied to a particular version number of its containing table. This means that
> if the table is modified in any way (eg even a comment changes) all of the
> partitions are effectively invalidated and need to be re-loaded from catalogd.
> We could avoid this invalidation-and-refetch in a couple ways:
> 1) make partitions immutable given an ID. Instead of modifying partitions in
> place, we could drop the partition and add a new one with a new ID. This is
> already done in several code paths, but not all. If we did this, then we'd
> just need to invalidate the partition _list_ for a table, and when we fetched
> the new list, we'd see which partitions changed and need to be reloaded.
> 2) add a partition-level version/sequence number which is modified whenever
> the partition is mutated in place. If we fetched that as part of the
> partition list, and used it as part of the cache key, we could avoid
> invalidating partitions when nothing changed. This would have the cost of 4
> or 8 bytes per partition (perhaps manageable considering the hundreds of
> bytes saved by recent patches)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]