[
https://issues.apache.org/jira/browse/HIVE-11294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates updated HIVE-11294:
------------------------------
Attachment: HIVE-11294.patch
This patch adds caching of the aggregates stats to HBase. It also
fundamentally changes how cached entries are matched. Now only exact matches
are taken, rather than partial matches as was done in the past. The key for
entries in the cache is an md5 sum of the dbname, tablename, and sorted list of
partition names. This allows for reasonable key sizes and fast lookup.
A limited number of entries are still kept in memory (10K by default) for a
limited time (1 min by default). This is to reduce back and forth to HBase.
Entries in HBase are kept in the cache for 1 week or until a partition's stats
are updated or the partition is dropped. Determining when an aggregate needs
to be dropped is not straight forward. Since the key is an md5 sum we cannot
determine from the key if an entry contains the partition that was updated or
dropped. To deal with this each entry also contains a bloom filter of all the
partition names. When a partition is updated or dropped it is added a queue.
Every 5 seconds a separate thread takes all of the entries from the queue and
does a full scan of the cache. It uses the bloom filters to determine if any
of the entries in the queue match one of the partitions in the aggregate. If
so, it drops the aggregate entry. Given that this is done by a bloom filter
there will be some false positives (entries that get dropped that shouldn't)
but the error rate was chosen to be very low (0.1%). This makes the bloom
filter larger but the motivation in choosing the bloom filter was to minimize
processing time rather than to save space.
All of this means there will be lag between when a partition is dropped or
updated and when the aggregate is dropped. It will be < 5 seconds if the drop
was done on the same HS2 instance, or <65 seconds if done on another instance.
Given that these are statistics I think that's acceptable.
Ideally we would not drop an aggregate as soon as a single partition is dropped
or updated. Instead we should be tracking the number of invalidated partitions
and only drop the aggregate once it reaches a threshold like 5%. Doing this
would require implementing the invalidation logic as a co-processor rather than
as a filter, which is why I didn't do it this way to begin with.
> Use HBase to cache aggregated stats
> -----------------------------------
>
> Key: HIVE-11294
> URL: https://issues.apache.org/jira/browse/HIVE-11294
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: hbase-metastore-branch
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: HIVE-11294.patch
>
>
> Currently stats are cached only in the memory of the client. Given that
> HBase can easily manage the scale of caching aggregated stats we should be
> using it to do so.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)