[ 
https://issues.apache.org/jira/browse/PHOENIX-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832705#comment-16832705
 ] 

Josh Elser commented on PHOENIX-5239:
-------------------------------------

{quote}The problem is that leads to unpredictability of response time.
{quote}
Gotcha. So, your persistent query is less effective because you still have 
variance in 'deploying' the results to the RS.
{quote} 

I would see this as the lesser of two evils when it's anticipated the subquery 
will end up being used on most of the regionservers. What's your opinion of 
adding a config option to toggle this behavior?
{quote}
I think I lean towards Lars' opinions as well – I don't like it, but I won't 
veto the change. A hint which indicates "cache on all servers" (instead of 
"cache on necessary servers") strikes me as the best middle-ground. People have 
to opt-in to it, but it's still usable for you without much pain. However, I 
would caution that this does _not_ work for multi-tenant installations. If you 
and I are each trying to cache a query on all RS which fill the available 
cache, we'll just stomp on each other.

I believe my bigger fear is that this is just one of more features in which you 
try to make RegionServers look like a distributed memory caching layer. 
RegionServers are definitely not built for such a thing (Java isn't great at 
keeping large hunks of memory resident). If this is something long-term you 
want to use, we may be better off trying to use a Redis or Memcached instead of 
keeping it in the RegionServer. Having a distributed filesystem behind HBase is 
something we can use, although, for large numbers of RegionServers, we might 
ourselves in a case where we have a thundering-herd going to a single DataNode 
(when the blocks aren't yet replicated to multiple DataNodes). In short, I 
think there's likely a better, long-term architecture choice to find, but it 
would require some experimentation to see what would be best (in both 
simplicity and effeciency) :). Good follow-on thoughts.

> Send persistent subquery cache to all regionservers
> ---------------------------------------------------
>
>                 Key: PHOENIX-5239
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5239
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: John Phillips
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PHOENIX-4666 introduced a persistent subquery cache that allowed phoenix to 
> cache the results from an expensive subquery (enabled with a 
> {{USE_PERSISTENT_CACHE}} query hint) to speed up subsequent queries.
> More context is available on the PHOENIX-4666 ticket, but a quick example 
> would be a query like:
> {code:java}
> SELECT /*+ USE_PERSISTENT_CACHE */ *
>     FROM table1
>     JOIN (SELECT id_1 FROM large_table WHERE x = 10) expensive_result
>     ON table1.id_1 = expensive_result.id_2
> WHERE table1.id_1 = [some_id]
> {code}
> Where lots of queries are ran, differing only by {{some_id}}. Our usage 
> involves first running one query over phoenix to warm the cache (which takes 
> ~20 seconds), then once complete, allowing the live query to run which 
> utilize the persistent subquery cache (~100ms).
> However, we noticed that when phoenix sends the cache to the regionservers, 
> it looks at {{some_id}} in the outer query to figure out which regionservers 
> might contain {{table1.id_1 = [some_id]}} ([code 
> here|https://github.com/apache/phoenix/blob/2084a6c/phoenix-core/src/main/java/org/apache/phoenix/cache/ServerCacheClient.java#L282-L283]).
>  This means that when we first start running the query, we'll inconsistently 
> hit the cache until it ends up being propagated to all the regionservers.
> Basically, we'd like to have some way to warm the subquery cache and ensure 
> it's on all the regionservers so subsequent queries will always find the 
> cache. I think the simplest solution might be updating the [if statement in 
> ServerCacheClient#addServerCache|https://github.com/apache/phoenix/blob/2084a6c/phoenix-core/src/main/java/org/apache/phoenix/cache/ServerCacheClient.java#L282-L283]
>  to simply always send the cache to all the regionservers if it's a 
> persistent subquery:
> {code:java}
> - if ( ! servers.contains(entry) &&
> -         keyRanges.intersectRegion(regionStartKey, regionEndKey,
> -                 cacheUsingTable.getIndexType() == IndexType.LOCAL)) {
> + boolean keyRangesIntersect = keyRanges.intersectRegion(regionStartKey, 
> regionEndKey,
> +         cacheUsingTable.getIndexType() == IndexType.LOCAL);
> + if (!servers.contains(entry) && (keyRangesIntersect || usePersistentCache)) 
> {
> {code}
> I tested this out, and it seems to work as expected. If it sounds like an 
> acceptable solution, I'd be happy to make an actual PR. Or, if anyone has any 
> other suggestions on better ways to handle this, it would be much appreciated.
> FYI [~jamestaylor], [~elserj], and [~maryannxue] since it looks like you 
> three handled most of the review on the [original persistent cache 
> PR|https://github.com/apache/phoenix/pull/298]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to