[ https://issues.apache.org/jira/browse/IMPALA-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587707#comment-16587707 ]
ASF subversion and git services commented on IMPALA-7437: --------------------------------------------------------- Commit 3fa05604aca2d8f65b3ded4950df8f38fffe43d5 in impala's branch refs/heads/master from [~tlipcon] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=3fa0560 ] IMPALA-7437. LRU caching of partitions in impalad This changes the CatalogdMetaProvider to use a Guava-based LRU cache. The eviction strategy is currently time-based (1 hour), and it only performs caching of some basic items like partition information, the null-partition-key-value, and table column statistics. It does not cache the table entries themselves, which means that we don't need to do any invalidation propagation via the statestore quite yet. Instead, every query will do an initial fetch of the table metadata in order to know the current version number. That version number is then used as part of the cache key for all further metadata, so when the version number changes, all of the prior cache entries become "unreachable" and effectively evicted. Initially, I attempted to implement this by adding a new MetaProvider implementation that would transparently wrap another MetaProvider implementation (either catalogd-based or direct-from-source). However, I found that I wanted to use catalogd-based implementation details like the version number in the cache key, and trying to abstract this behind an interface wasn't very clear. So, I elected to just embed the caching logic into the CatalogdMetaProvider itself. Note that this patch upgrades the Guava reference in the pom from 11.0.2 to 14.0.1. In fact, I found that Guava 14.0.1 was already leaking onto the classpath by being included in hive-exec.jar, so it was ending up picking one or the other in a somewhat unpredictable fashion. The CacheBuilder class had a small API change between v11 and v14 so I needed to ensure a specific version so that Eclipse and Maven agreed on which version to build against. This includes some basic unit testing and I also verified that some query tests like TPCH pass. Change-Id: I9a57521ad851da605604a1e7c48d3d6627da5df5 Reviewed-on: http://gerrit.cloudera.org:8080/11208 Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Vuk Ercegovac <vercego...@cloudera.com> > Simple granular caching of partition metadata in impalad > -------------------------------------------------------- > > Key: IMPALA-7437 > URL: https://issues.apache.org/jira/browse/IMPALA-7437 > Project: IMPALA > Issue Type: Sub-task > Reporter: Todd Lipcon > Priority: Major > > This JIRA tracks adding a simple cache to the catalog implementation in the > impalad to cache table partitions and their file metadata. The initial cut > will not cache other objects like functions, databases, table names, etc, so > that we can avoid having to do more complex invalidation at first. > Additionally, a simple time-based expiration will be used, to be replaced > later with size-based eviction. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org