[
https://issues.apache.org/jira/browse/PHOENIX-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816683#comment-16816683
]
Lars Hofhansl commented on PHOENIX-5239:
----------------------------------------
What if there are 1000 regionserver, and your query only touches 10 of them?
> Send persistent subquery cache to all regionservers
> ---------------------------------------------------
>
> Key: PHOENIX-5239
> URL: https://issues.apache.org/jira/browse/PHOENIX-5239
> Project: Phoenix
> Issue Type: Improvement
> Reporter: John Phillips
> Priority: Major
>
> PHOENIX-4666 introduced a persistent subquery cache that allowed phoenix to
> cache the results from an expensive subquery (enabled with a
> {{USE_PERSISTENT_CACHE}} query hint) to speed up subsequent queries.
> More context is available on the PHOENIX-4666 ticket, but a quick example
> would be a query like:
> {code:java}
> SELECT /*+ USE_PERSISTENT_CACHE */ *
> FROM table1
> JOIN (SELECT id_1 FROM large_table WHERE x = 10) expensive_result
> ON table1.id_1 = expensive_result.id_2
> WHERE table1.id_1 = [some_id]
> {code}
> Where lots of queries are ran, differing only by {{some_id}}. Our usage
> involves first running one query over phoenix to warm the cache (which takes
> ~20 seconds), then once complete, allowing the live query to run which
> utilize the persistent subquery cache (~100ms).
> However, we noticed that when phoenix sends the cache to the regionservers,
> it looks at {{some_id}} in the outer query to figure out which regionservers
> might contain {{table1.id_1 = [some_id]}} ([code
> here|https://github.com/apache/phoenix/blob/2084a6c/phoenix-core/src/main/java/org/apache/phoenix/cache/ServerCacheClient.java#L282-L283]).
> This means that when we first start running the query, we'll inconsistently
> hit the cache until it ends up being propagated to all the regionservers.
> Basically, we'd like to have some way to warm the subquery cache and ensure
> it's on all the regionservers so subsequent queries will always find the
> cache. I think the simplest solution might be updating the [if statement in
> ServerCacheClient#addServerCache|https://github.com/apache/phoenix/blob/2084a6c/phoenix-core/src/main/java/org/apache/phoenix/cache/ServerCacheClient.java#L282-L283]
> to simply always send the cache to all the regionservers if it's a
> persistent subquery:
> {code:java}
> - if ( ! servers.contains(entry) &&
> - keyRanges.intersectRegion(regionStartKey, regionEndKey,
> - cacheUsingTable.getIndexType() == IndexType.LOCAL)) {
> + boolean keyRangesIntersect = keyRanges.intersectRegion(regionStartKey,
> regionEndKey,
> + cacheUsingTable.getIndexType() == IndexType.LOCAL);
> + if (!servers.contains(entry) && (keyRangesIntersect || usePersistentCache))
> {
> {code}
> I tested this out, and it seems to work as expected. If it sounds like an
> acceptable solution, I'd be happy to make an actual PR. Or, if anyone has any
> other suggestions on better ways to handle this, it would be much appreciated.
> FYI [~jamestaylor], [~elserj], and [~maryannxue] since it looks like you
> three handled most of the review on the [original persistent cache
> PR|https://github.com/apache/phoenix/pull/298]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)