[
https://issues.apache.org/jira/browse/HIVE-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400493#comment-16400493
]
BELUGA BEHR edited comment on HIVE-16879 at 3/15/18 2:45 PM:
-------------------------------------------------------------
[[email protected]] I will see if I can't pull up some information...
_Edit:_ I submitted my comment accidentally. Re-writing now.
The idea here is that, when we're storing keys, we're storing...
||database||table||column||
|default|mytable|column1|
|default|mytable|column2|
|default|mytable|column3|
|default|mytable|column4|
|default|mytable|column5|
|default|user|first_name|
|default|user|last_name|
|default|user|age|
|default|user|creation_date|
So, what we can see here is that most of the variability of a key is in the
column name, so when we're checking for key equality, we should start by
comparing column names. If we start with comparing database names, we spend "a
lot" (relative) of time comparing the same string, again and again. In terms
of caching, we can say that same thing, it is not worthwhile to cache the
column name, because there are unlikely to be many duplicates. However, the
database and tables names are likely to be duplicated many times and therefore
could be cached.
was (Author: belugabehr):
[[email protected]] I will see if I can't pull up some information...
The idea here is that, when we're storing keys, we're storing...
||database||table||column||
|default|mytable|column1|
|default|mytable|column1|
|default|mytable|column1|
> Improve Cache Key
> -----------------
>
> Key: HIVE-16879
> URL: https://issues.apache.org/jira/browse/HIVE-16879
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 3.0.0
> Reporter: BELUGA BEHR
> Assignee: BELUGA BEHR
> Priority: Trivial
> Attachments: HIVE-16879.1.patch, HIVE-16879.2.patch
>
>
> Improve cache key for cache implemented in
> {{org.apache.hadoop.hive.metastore.AggregateStatsCache}}.
> # Cache some of the key components themselves (db name, table name) using
> {{String}} intern method to conserve memory for repeated keys, to improve
> {{equals}} method as now references can be used for equality, and hashcodes
> will be cached as well as per {{String}} clash hashcode method.
> # Upgrade _debug_ logging to not generate text unless required
> # Changed _equals_ method to check first for the item most likely to be
> different, column name
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)