[
https://issues.apache.org/jira/browse/HIVE-18264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vaibhav Gumashta updated HIVE-18264:
------------------------------------
Description:
Currently we have a separate cache for partitions and partition col stats which
results in some calls iterating through each of these for retrieving/updating.
For example, to modify a partition col stat, currently we need to lock table,
partition and partition col stats caches which are all separate hashmaps. We
can get better performance by organizing hierarchically. For example, we can
have a partition, partition col stats and table col stats cache per table to
improve on the previous mechanisms. This will also result in better
concurrency, since now instead of locking the whole cache, we can selectively
lock the table cache and modify multiple tables in parallel.
In addition, currently, the prewarm mechanism populates all the caches
initially (it skips tables that do not pass whitelist/blacklist filter) and it
is a blocking call. This patch also makes prewarm non-blocking so that the
calls for tables that are already cached can be served from the memory and the
ones that are not can be served from the rdbms.
was:Currently we have a separate cache for partitions and partition col stats
which results in some calls iterating through each of these for
retrieving/updating. We can get better performance by organizing
hierarchically. We should also make prewarm non-blocking
> CachedStore: Store cached partitions/col stats within the table cache and
> make prewarm non-blocking
> ---------------------------------------------------------------------------------------------------
>
> Key: HIVE-18264
> URL: https://issues.apache.org/jira/browse/HIVE-18264
> Project: Hive
> Issue Type: Bug
> Reporter: Vaibhav Gumashta
> Assignee: Vaibhav Gumashta
> Priority: Major
> Attachments: HIVE-18264.1.patch, HIVE-18264.2.patch,
> HIVE-18264.3.patch, HIVE-18264.4.patch, HIVE-18264.5.patch
>
>
> Currently we have a separate cache for partitions and partition col stats
> which results in some calls iterating through each of these for
> retrieving/updating. For example, to modify a partition col stat, currently
> we need to lock table, partition and partition col stats caches which are all
> separate hashmaps. We can get better performance by organizing
> hierarchically. For example, we can have a partition, partition col stats and
> table col stats cache per table to improve on the previous mechanisms. This
> will also result in better concurrency, since now instead of locking the
> whole cache, we can selectively lock the table cache and modify multiple
> tables in parallel.
> In addition, currently, the prewarm mechanism populates all the caches
> initially (it skips tables that do not pass whitelist/blacklist filter) and
> it is a blocking call. This patch also makes prewarm non-blocking so that the
> calls for tables that are already cached can be served from the memory and
> the ones that are not can be served from the rdbms.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)